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ABSTRACT 
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network. Migration mechanisms allow object relocation among different nodes, and permit local caching of 
code. A low cost process control system based on fast-allocated contexts allows parallelism at a significantly 
fine grain (on the order of 30 instructions per task). 

The system services are developed in detail, and may be of interest to other designers of fine grain, 
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Chapter 1 



Introduction 



/ am the people — the mob — the crowd — the mass 
Do you know that all the great work of the world is done through me? 

— Carl Sandburg, in I Am the People, the Mob (1916) 

Power is the great aphrodisiac. 

— in The New York Times (January 19, 1971) 

Concurrent processing is becoming a progressively more popular field in computer 
science. The vision of harnessing previously undreamt of computational power at a reason- 
able cost is leading the drive. By connecting many moderately powerful microprocesors in a 
communications medium, system designers hope to be able to take advantage of the collec- 
tive power of the architecture to solve tasks that were previously time or cost-prohibitive. 

Unfortunately, the eager concurrent system designer soon finds that many issues 
are still unresolved. Though people have a fairly good grasp of ways to build successful 
sequential machines, it is less clear how to build optimal, or even acceptable concurrent 
systems. The designer is soon faced by a barrage of questions that are difficult to answer. 
"What grain of parallelism should be supported?" "What level of functionality should the 
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processors provide?" "How should the processors communicate?" "How tightly coupled 
should the processors be?" "How should memory be managed?" "How should the load be 
distributed?". Many research groups are attempting to answer these questions at this very 
moment. 

Some insight into concurrent architectures has been gained over the years, and 
the current directions of research reflects the knowledge gained. Multicomputer networks 
(sometimes called "ensemble machines") are one direction that concurrent systems research 
has taken. This genre of machine connects relatively conventional microprocessors via an 
automatically routed network. The design is advantageous because it takes advantage of well 
understood sequential processor technology for the processing nodes, and the performance of 
the system can grow proportionately with the number of processors 1 , providing scalability. 
For the past two years, the Concurrent VLSI Architecture Group at M.I.T. has been 
designing a concurrent processing network, christened the Jellybean Machine, under the 
direction of Professor William Dally [Dal86c]. The goal of the Jellybean Machine project is 
to design a scalable concurrent processor out of low-priced (jellybean) parts, that efficiently 
supports an object-oriented execution model. The processor is targeted at both symbolic 
and numeric applications, and will be programmed in high-level, object-oriented languages. 
It hopefully will serve as a succesful example and a test bed for advanced concurrent systems 
research. 

1.1 Scope of Thesis 

This thesis report describes the design and implementation of an operating system prototype 
for the J- Machine. The operating system was required to support a global namespace across 
the distributed processors, allocate memory in an object-based storage model, support 
1 at least up to some point. 
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inter-processor communication, provide system services to control code execution, object 
migration, and an object-oriented calling model. It also provided a perch from which more 
advanced issues in system design could be studied. 

1.2 Highlights of Contributions 

In the course of the design of the J-Machine operating system, several ideas were developed 
that may be of special interest to the designer of multicomputer networks. 

• In section 3.4, I describe a virtual addressing system that resolves objects names 
across distributed nodes by a mechanism known as hometown addressing. This scheme 
delegates to object birthnodes the responsibility for knowing current object residences, 
permitting object migration. An accompanying mechanism of "hints" is provided to 
improve performance. 

To simplify the hardware with minimal cost in flexibility, we have developed an ex- 
plicit, one time virtual translation scheme via the XLATE machine instruction, that 
converts a virtual address to a physical one. Retranslation is provided for automati- 
cally by fault handlers. 

Chapter 5 describes a low overhead code execution model that supports inexpensive 
remote procedure calls, local caching of code, and convenient suspension and resump- 
tion of processes. 

Section 5.4 describes a system for fast context creation that involves the re- use of old 
context objects. This is an important optimization based on the short life and rapid 
freqency of context allocation. 

• Section 5.6 outlines a simple and fast, resource distribution mechanism that limits 
bottlenecks and cross network traffic by dynamically creating a type distribution tree 
for the resource. 
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1.3 A Closer Look At The Jellybean Machine 





1 


| 












MDpfe== 


mdpC= 




,:,,,,,,,,,, 


MDPI 






„„:,„::,:, 










MDPp= 


MPPC= 




m-: ft*:::*:*.-: 


MDPI 






m<m<u 






^WlRpR ^Bli^HB VJ&JI 




mdpB^ 


MDP»^1 




|*™~«>.™. 


MDPI 






•««*«***** 















The J-Machine is composed of many custom RISC microprocessors called Message-Driven 
Processors or MDPs. These processing elements have small, local memories and are con- 
nected in a loosely coupled network. Inter-node communication is provided via message 
sends that are automatically routed to the proper destination nodes. A virtual object- 
based memory abstraction is built over the distributed nodes providing a uniform global 
namespace. Various levels of low-cost execution control provide a reasonably fine grain 
of concurrency (on the level of 30 instruction procedures). An object-oriented execution 
model is built upon this fine-grain execution model. The rest of the system implements 
miscellaneous system services and mechanisms to improve performance. 
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1.4 Background 

Concurrent architecture design has been seriously studied for at least the past fifteen years, 
but there is still much to be learned. The various visions of machines, operating systems, 
and target applications are so diverse, that few definitive statements can be made. 

We see SIMD parallelism, promoted by vector operations as seen in the Cray. More 
complicated architectures like the Connection Machine [HU85], and systolic array processors 
like the Warp [Kun82] are alternative approaches, providing fine-grain concurrency with 
repetitive processing while permitting reconfiguration. MIMD architectures are just as 
diverse. There are extremely fine-grain dataflow machines like the Manchester Machine, 
Sigma-1, and the MIT Tagged-Token dataflow Machine [Aea80], bus-based shared memory 
architectures like the IBM RP3, Inmos Transputer, and C.mmp [WLH81], multicomputer 
networks like the Cosmic <_ :be [Sei85] and Cm* [OSS80] and distributed systems like System 
R* [Lin80]. 

The Jellybean Machine, while borrowing ideas from successful research endeavors, 
has goals unique enough to gain a somewhat different character from other machines of 
its genre. It communicates via message passing and addresses only local memory, as in 
the Cosmic Cube [Sei85] and the Medusa system [OSS80]. On the other hand, these two 
systems control execution by a system of pipes and locks, where processes wait for data to 
arrive via messages. The J-Machine, instead, uses message sends to schedule processes, and 
not to provide socket-to-socket communication. State manipulation doesn't involve explicit 
connections between running processes. Instead, return values are propagated around to 
slots in contexts and code is executed when results arrive in a more "functional" manner. 

Many systems also have virtual memory and some systems use an object or segment 
based storage model [WLH81] as does the J-Machine, but the emphasis is slightly different 
in our design. Where most systems use a virtually addressed, multi-level memory system 
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to expand primary memory and provide relative address mapping, the J-Machine uses a 
virtual addressing system to provide a global namespace across all nodes and to provide 
convenient access to objects as the primitive memory metric. This is more similar to large, 
complex»distributed systems such as IBM's distributed database, System R* [Lin80] than 
conventional parallel processors. 

Finally, the J-Machine targets itself to a high-level programming environment. The 
RISC processing node, called the Message- Driven Processor [HT88], provides a fast, power- 
ful substrate for the execution of high-level languages, such as Smalltalk. There are several 
architectures designed for the efficient execution of high-level language applications, such 
as the Symbolics Lisp Machine and the SOAR Smalltalk processor [Ung87], but very little 
work has been done targeting concurrent processors to high-level languages. 

1.5 Organization 

The rest of this report will discuss the structure of the Jellybean system. Chapter 2 provides 

a high level layering of the Jellybean system — from single processing node hardware to the 

high level programming of the entire concurrent processing network. Chapter 3 describes 

the memory management and addressing system. Chapter 4 discusses the machine as a 

distributed system supporting object migration to balance load. Chapter 5 explains code 

execution on the method level, and 6 details the object-oriented calling extensions. Storage 

reclamation issues will be introduced in chapter 7. Chapter 8 discusses some of the services 

provided to support high-level language constructs and to control code execution. Chapter 

9 describes the prototype operating system implementation noting its successful as well as 

not-so-successful features, and discussing some of the difficulties and quirks faced by the 

system designer. The report concludes with a performance evaluation and summary in 

chapters 10 and 11. 



Chapter 2 



The Execution Model of the 
Jellybean Machine 



These unhappy times call for the building of plans ... 
that build from the bottom up and not from the top down 

— Franklin Delano Roosevelt, in his April 17, 1932 Radio Address 



The Jellybean Operating System Software (JOSS) is built in a layered manner where 
each layer provides a different model of functionality to the machine. Figure 2.1 attempts to 
describe this layering, and what new functionality each layer provides to the entire system. 

At the bottom of the figure lies the base processor and boot code. At this stage, 
the processing node can be initialized, and can run independently as a limited micropro- 
cessor. The addition of system call and fault handlers provide a level of system services 
and robustness to the microprocessor, allowing it to allocate memory in an object-based, 
virtually addressed manner, and to handle various types of exceptional conditions at run 
time. These first two levels of the Jellybean system build up the abstract processing node 
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Figure 2.1: Layering of Jellybean System 
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capable of executing machine code and performing a set of system services. 

Concurrency is provided as the next level of functionality by the introduction of 
primitive message handlers. Each processing node has the ability to send messages to any 
other node, where a message is simply a physical address to start running on a foreign node, 
followed by routine-specific data. Thus, a Jellybean primitive message is actually just a way 
of changing a program counter of a remote node. A set of common operations can be placed 
in identical physical memory locations on each node, so that an operation can be run on any 
node by mailing that routine's address to the node. The operating system provides a small 
set of primitive message handlers to perform common operations which reside in the same 
locations on each node. With this small set of locked-down routines, the machine gains the 
ability to compute concurrently, to use a global addressing abstraction over the physically 
distributed memories, and to perform some amount of object migration and other control 
of resources. 

Two special primitive message handlers are special, in that other system services are 
built on top of them. The CALL message handler provides a mechanism for starting code 
contained in virtually- addressed relocatable objects, rather than just code that resides at 
locked-down physical addresses. This provides a convenient way of packaging objects and 
supporting remote procedure calls. The SEND message takes the code execution mechanism 
to an even higher level, and provides for a dispatch-on-type calling model as used in object- 
oriented systems like Flavors or Smalltalk. 

The final two layers of the system are the interfaces for the programming models. 
The Jellybean Machine under this highest level of abstraction appears to the user a system 
to run high-level languages like Smalltalk. 

The rest of this chapter will go into the abstractions in more detail, describing what 
functionality each level of the machine provides. It may be helpful to refer back to figure 
2.1 as you read the following sections. 
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2.1 The Processing Node 

Each node of the Jellybean multiprocessor (a Message- Driven Processor) is a tagged- 
architecture microprocessor with a small on-chip memory with separate register sets for 
operating at two priority levels. 

2.1.1 Machine Code 

The machine code interpreted by a Message-Driven Processor (MDP) is a simple 3 operand 
instruction set [HT88]. Code is executed sequentially, and changes in control are provided 
by simple conditional and unconditional branches. The instruction stream is accessed via 
two registers, one that points at the base of the code block (AO), and one that indicates 
the current offset into this block (IP). 

2.1.2 System Calls 

The processor also has a small fixed length stack, and a mechanism to make system calls. 
This provides us with the ability to change control to common subroutines, and easily restore 
execution upon return. The addition of the system call machinery gives us the ability to 
provide several extensions to the processor in terms of system services written in machine 
code. Heap management, and an object-based memory allocation model are provided with 
system calls, as are the mechanisms to address these objects with relocatable, virtual IDs. 

2.1.3 Fault Handlers 

Similar to system calls, the MDP also contains a fault handler table providing software 
routines to run when instructions fault because of various exception conditions (tag mis- 
matches, addressing past segment, integer overflow, translation buffer lookup miss, etc.). 
When a fault occurs, the IP is pushed onto the stack, and the appropriate fault routine 
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(found in the exception vectors table) is run. An address of each fault handlers is placed 
in the exception vector table by software initialization. The addition of the fault handlers 
gives us several advantages in our quest of an object-oriented concurrent processor. We can 
use tag checking to support optimistic code generation and a type of "generic operation" 
approach on the machine code level. The fault handlers also provide us the ability to effi- 
ciently implement virtual ID lookup via the XLATE instruction. The fault handlers will be 
described in more detail later when the entire system has been more thoroughly explained. 

Since both the system calls and fault handlers are supported by a software initialized 
vector table, the processor can be "reshaped" into a different type of machine by replacing 
the ROM code that sets up this table. Only the instruction set is fixed, allowing the MDP 
processing node to be used as a basis for various alternative concurrent processing system 
paradigms. 

2.1.4 The Basic Node of Computation 

With what we have described so far, our processor is a sequential machine, able to be 
executing in one of two priorities. It refers to its instruction stream using physical memory 
base and offset registers. The addition of the system calls provides an interface to OS 
services, such as those to allocate memory, generate virtual object IDs and to manage object 
ID to physical address translation. The fault handlers permit us to develop "optimistic" 
code, where a normal, error-free execution will proceed rapidly, and we only pay the price of 
software execution if an error condition occurs. The fault handlers are also used to support 
a fast virtual namespace, where translation can be as fast as the XLATE instruction. 

The sum is a flexible, object-based microprocessor that will serve as our basic node 
of computation as we venture into the realm of concurrency. 
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2.2 The Concurrent Processor Model 

By providing mechanisms for node- to- node communication, our machine becomes a mul- 
tiprocessor, called the Jellybean Machine. Many MDP processing nodes (as well as other 
potential nodes such as floating point processors and memory nodes) are connected together 
in a network. Communication between the nodes is provided by the MDP SEND instruction 
which injects messages into the network. The messages are routed by routing hardware to 
the message queues on the destination node. 

Messages received by an MDP processing node consists of two parts, a message 
header which contains the address of the primitive message handler to run, and a sequence 
of message specific data words. The header of the message acts in effect like a process 
descriptor for providing efficient message execution. When a message arrives at the specified 
node, it lands in the destination node's queue. The queue acts as a FIFO scheduler of 
primitive message processes. When the message moves to the head of the queue, the MDP 
executes the message by setting the instruction pointer register to point to the primitive 
message handler whose address is in the header of the message. 

Several useful system services are written as primitive message handlers. Examples 
of primitive message handlers include those to make a new object on a node (NEW.MSG) 
and to request a copy of a method from a node (METHOD _REQUEST_MSG). 

With the addition of primitive messages, we have the ability to process concurrently, 
and to support a distributed namespace. We can now extend our virtual memory system 
to support naming of objects, not just in the local memory, but on any node in the entire 
network. With a distributed namespace, we gain flexibility of resources. We can migrate 
objects as we need them to balance load and to free up memory. 
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2.2.1 Methods and the CALL Message 

Up to this point, we have only been able to run foreign code that resides at fixed physical 
locations. We desire a more flexible mechanism for dealing with blocks of code, such as those 
that will be output by compilers. Since we already have an object based storage model, 
it would be very convenient to store code routines in objects and provide a mechanism for 
their execution. We call code routines stored in virtually addressed, relocatable objects 
methods to differentiate them from physical locked down code sequences. We provide a 
mechanism to start these methods executing by writing a primitive message handler called 
the CALL message handler. When a CALL.MSG starts executing on a node, it runs the 
method indicated in the message argument. This allows us to have a flexible system of 
remote procedure calls. 

2.2.2 SENDing Selectors to Objects 

The final operating system layer in our quest for an object-oriented execution model is 
the SEND.MSG message handler. A SEND_MSG consists of a selected generic operation, 
represented by a unique symbol called a selector, followed by the object(s) that the selector 
acts upon. If we wanted to send the DRAW selector to an object (say a triangle), we 
would SEND a SEND.MSG message to the node the triangle object resides on, passing the 
selector DRAW, and the virtual address of the triangle object receiving the selector (called 
the receiver). When the SEND.MSG handler gets executed, it determines the appropriate 
method to run, and then remotely calls the procedure by sending a CALL.MSG message 
to this method which then draws the triangle. 

In order for this system to work it is necessary to maintain certain system tables 
that map pairs of selectors and object classes with the virtual IDs of methods to perform 
the desired information. It is also necessary to insure that semantically indentical selector 
operations get the same selector symbol. In other words, all PLUS operations must get the 
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same symbol representing +. The exact mechanisms of the class/selector system will be 
described in more detail in chapter 6. 

2.3 High Level Language Model 

For the final part of our tour of the Jellybean Machine, let us step back once more, and 
view the machine from the perspective of the programming languages that will be used to 
write user programs. 

2.3.1 Intermediate Code 

To provide a uniform target language for compilers, we have specified an intermediate 
language called i-code. This language has a simple set of operations, and a simple manner of 
referencing operands. By passing the send code through a code generator and a linker/loader 
we can store actual MDP machine code on nodes. The i-code level of the system provides a 
convenient entry point for various compilers that necessitates no knowledge of the underlying 
layers. All interaction is via the protected subsystem of the i-code interface. This interface, 
in effect, provides an abstract i-code machine that can be of use in many different machine 
configurations. Implementations of this interface on different machine architectures would 
provide a convenient way to reuse compilation tools and compare system performance. 

2.3.2 User Languages 

The user language model is what would be seen by the user of the Jellybean Machine. He/she 
would be faced with the language interaction shell and would see none of the internal layers 
that compose the system. The currently supported user language is a prefix notation form 
of concurrent Smalltalk [DC]. Other languages, such as a Lisp with flavors should also be 
possible. 



Chapter 3 

Memory Management and 
Addressing System 



Work without hope draws nectar in a sieve 
And hope without an object cannot live 

— Samuel Taylor Coleridge, in Work Without Hope 

Oh call it by some better name 
For friendship sounds too cold. 

— Thomas Moore in Ballads and Songs: Oh Call It by Some Better Name 



The Jellybean Machine, targeted for object-oriented applications, needs to have an 
object-based storage model. This chapter sketches the machinery that interact to provide 
this model. The mechanisms basically consist of two parts, (1) the services to allocate and 
deallocate contiguous blocks of physical memory, and (2) the virtual addressing abstractions 
that make objects the basic unit of storage. This virtual address allows object relocation 
and provides a way to reference storage on foreign nodes. Virtual naming and physical 
allocation systems combine to form an object based programming system. 
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Figure 3.1: Schematic Model of the Memory System 



At the heart of the object based system is the NEW system call, which creates a 
new object. This routine utilizes the 3 object system subsystems, the translation manager, 
the name manager, and the memory manager. This interaction of the various systems is 
shown in figure 3.1. 
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3.1 "Freetop" Contiguous Heap Allocation 

Each node of a Jellybean Machine has its own local memory that can be accessed very 
rapidly. Part of this local memory is reserved as a heap to allocate blocks of memory from. 
Heap allocation is done in a straightforward "freetop-next" manner. Memory is allocated 
starting from the current top of free memory, and the freetop pointer is moved past the 
block allocated. The ALLOC system call handles the allocation requests. 

3.2 Compaction is Fast 

Deletion of objects fragments the heap leaving unused "holes" in the heap. We reclaim this 
storage by sweeping objects down toward the base of the heap, to fill up the blank space, 
with the freetop following accordingly. Since each local memory is small and fast, and 
each processor can sweep in parallel, compaction takes very little time. Figure 3.2 shows a 
process of heap allocation, deletion, and compaction. 

3.3 Physical Base/Length Addressing 

Blocks of memory are described by physical base/length values supported by the processor's 
primitive ADDR data type. The base is the starting address of the block of memory, and the 
length is used for access bounds checking. The format of an ADDR tagged value is shown 
in figure 3.3. The tag of the physical address word is a unique number ADDR representing 
a physical address value. The R bit is used to specify that an address value points to a 
relocatable object. The I bit specifies that the address is now invalid. Both of these bits 
are used for the implementation of virtual addressing. 
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Figure 3.5: A Virtual Address Word (ID) Format 



format of this virtual ID is shown in figure 3.5. There are also several utility routines used 
to manage the virtual -+ physical translation table (called the Birth/Residence Address 
Table, or BRAT). These routines add, lookup, and remove bindings from the translation 
table. They are implemented by the extended system calls BRAT.ENTER, BRAT JCLATE, 
and BRAT .PURGE respectively. Finally, we provide the NEW system call to allocate and 
install a new object. This service allocates physical memory, generates a virtual ID, installs 
the virtual -> physical binding in the BRAT, and returns both the ID and the address. The 
NEW system call is to the virtual addressing model as ALLOC is to the physical addressing 
model. 

3.4.3 Translation Buffer 



To speed up translation, each processing node has a 2-way set-associative translation buffer, 
and the accompanying ENTER, XLATE, and PURGE machine instructions. The XLATE 
instruction will fault if no binding is found in the cache, and a software exception handler 
will be run to resolve the name. 
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Figure 3.5: A Virtual Address Word (ID) Format 



format of this virtual ID is shown in figure 3.5. There are also several utility routines used 
to manage the virtual - physical translation table (called the Birth/Residence Address 
Table, or BRAT). These routines add, lookup, and remove bindings from the translation 
table. They are implemented by the extended system calls BRAT .ENTER, BRAT JCLATE 
and BRAT .PURGE respectively. Finally, we provide the NEW system call to allocate and 
install a new object. This service allocates physical memory, generates a virtual ID, installs 
the virtual -> physical binding in the BRAT, and returns both the ID and the address. The 
NEW system call is to the virtual addressing model as ALLOC is to the physical addressing 
model. 



3.4.3 Translation Buffer 

To speed up translation, each processing node has a 2-way set-associative translation buffer, 
and the accompanying ENTER, XLATE, and PURGE machine instructions. The XLATE 
instruction will fault if no binding is found in the cache, and a software exception handler 
will be run to resolve the name. 
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Figure 3.6: Format of the Translation Buffer 



CHAPTER 3. MEMORY MANAGEMENT AND ADDRESSING SYSTEM 30 

3.4.4 Automatic Retranslation 

To support maximum efficiency in normal case situations, the processing node provides an 
"invalid" bit in each address (A) register. If this bit is set, it signifies that the ID and A 
register have values that are no longer consistent. Any access of an invalid A register will 
cause a fault handler to be run which will retranslate the ID register into the A register 
and continue. This way we can be "lazy" and retranslate invalid bindings only if needed. 

3.5 Summary 

Physical block allocation is used to reserve segments of memory. Virtual IDs are associ- 
ated with these blocks of memory, and bindings are formed, to provide an "object-based" 
allocation model. This object allocation model provides the following benefits 

• An abstract memory model, where "objects" are the primitive metric of storgae rather 
than physical addresses. 

• A location independent memory model with indirection through a translation table, 
allowing ease of relocation. 

• The ability to represent the data types of objects. 

• The introduction of a global namespace where we can refer to objects residing on any 
node of the network. 



Chapter 4 



Distributed System Support 



/ pity the man who can travel from Dan 
to Beersheba and cry, 'Tis all barren! 

— Lawrence Sterne, in A SentimentalJourney (1768) 



In the previous chapter we developed a object based allocation model and a global 
naming system. With this functionality, we gain much greater flexibility. We take this 
system one step further in this chapter, as we describe a mechanism to migrate objects 
from node to node. This added ability requires a few extensions to the virtual naming 
model presented in the previous chapter. 

4.1 The Idea 

In the previous naming model, virtual IDs were bound to physical addresses. Since objects 
were not allowed to migrate, they were forced to always reside on their birthnode. Now that 
objects are allowed to emigrate to different nodes, we need to expand our name resolution 
system. In addition to virtual -* physical bindings we add a virtual -► node-number 
binding semantically representing a "hint" that the object in question now resides on a 
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Figure 4.1: An Example of Hints 



different node number. Figure 4.1 shows that node #1 has a hint that an object is on node 

#2. 



4.2 Chaining of Hints 

These node number "hints" indicate another node to look on for the object in question. The 
current implementation allows chaining of hints (although cycles will never form). If we ever 
follow a path of hints and find no binding for the object ID, we then query the birthnode 
which is required to have a path to the object in question. Figure 4.2 is a snapshot of a 
system where a chain of hints has formed to an object. 

A question then arises as to how long to let these chains of hints be. Some distributed 
systems, such as System R* [Lin80], only allow paths of length 1, i.e. one hint. If the 
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object is not one hint transition away, the system then defaults to the birthnode where 
the location of the object is found, and the previous incorrect hint is updated. However, 
in our system we choose to have multiple hints because objects may migrate quite a bit, 
and this would increase the number of birthnode accesses. Performance could significantly 
degrade if a popular object moved quite a bit (as we would expect popular objects to do). 
If we notice in later performance experiements, that chains of hints become commonplace, 
adding latency and unnecessary network traffic, we can adopt one of 2 solutions, (1) only 
allow one hint or (2) collect and update old hints periodically. 

4.3 Calculating Likely Nodes From Object IDs 

The operating system provides a system call for finding a likely node that an object resides 
on. This ID.TO .NODE call takes the virtual ID of the object and returns a node number. 
It does so by the algorithm charted in figure 4.3. It works in the Mowing way. The virtual 
ID is looked up in the translation table. If it is not there, we have no idea where the object 
is, so we check the birthnode. If there is a binding, but the binding is to a hint (an integer 
value), we return this hint as the probable residence node. Finally, if the binding is to a 
physical address, the object is local, and the local node number is returned. 

4.4 Virtual To Physical Translations In The Migrant Ob- 
ject World 

Now that objects are allowed to wander aimlessly across the nodes of the Jellybean Machine, 
virtual to physical address translations are necessarily slightly more sophisticated. Three 
conditions can occur when we attempt to translate a virtual ID into a physical address. 

1. We find a physical address value for the binding 

2. We find a hint to where the object currently resides 
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3. We find no binding for the object 

Case 1 is the normal situation. The physical address associated with the object ID is 
returned. Case 2 implies that the object is rumored to be on a foreign node. We then 
send a request to this node asking that the object be shipped here for processing, and we 
suspend our process onto a wait list. Case 3 occurs when a node has no idea where an 
object resides. In this case, we send a request to the birthnode asking for the object. If the 
birthnode doesn't know where an object is, it loops, mailing messages to itself, assuming 
the object is in a state of transition somewhere. 

4.5 Bouncing Objects 

Note that this method of finding data objects may cause them to bounce around from node 
to node, as different processors wish to compute on them. This is the direct result of several 
design decisions: (1) each processor executes only one task at a time, (2) memory is not 
shared among processors, (3) mutable data objects are not cached, and (4) an object's data 
lies entirely on one node. The first and second decisions are fundamental to the design of 
our machine. We chose the grain size and memory model to provided a moderately fine 
grain, highly scalable processor. We chose not to do object caching because it is expensive 
to do in software, and is difficult on a network based memory model. It may be possible to 
provide coherent caching in the future however. The final restriction, that an object's state 
is contained on one node only is for simplicity's sake, and can be at least partially lifted by 
the introduction of "distributed objects" described in a later section. 

So, with these characteristics in mind, it becomes important for us to try to prevent 
unnecessary "pinging" of objects from node to node. One way this is done is by "sending 
work to the object" rather than "sending the object to the work". Unfortunately, this is 
difficult to do in the general case due to problems with transferring processor state. As a 
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compromise, we set the following policy. 

1. If we were sending a selector to an object, and the object is not local, we forward the 
selector to the location of the object 1 . 

2. If we were accessing a non-local, immutable object, we halt, saving our process state 
request a copy of the object, and restart execution when the co P y drives 

3 ' move tW,h£?K ing * T^I m ? tab - le oh ^ we haJt ' savin S our P rocess state, 
move the object here, and restart when it arrives. 

This policy reduces the severity of the "pinging" problem, because work tends to accumulate 
at the object, while at the same time, allowing the object to move if it has to. 

4.6 Details About Object Migration 

This section formalizes the mechanisms provided to migrate objects. When we try to access 
a non-local object, we mail away to request a copy of the object or to move the object 
(depending on whether the object is immutable or mutable, respectively) 2 . When we wish 
to request a non-local object, the following steps are taken: 

L J h< l u pr ^n SS ?*u Sta1 f •" S ? v< r d in a context ob Ject, and the context is marked waiting 
tor the ID of the object being requested. 

2. The context is placed in a resource wait table that indicates processes waiting on 

3. A MIGRATE-OBJECT message is sent to the best guess residence of the object, 
asking it to be migrated to the requesting node, and the process suspends, able to 
execute the next message in the queue. 

4. This MIGRATE OBJECT message is forwarded down the chain of hints. If it lands on 
a node with no binding for the ID in question, the search continues at the birthnode. 
tinally this message arrives at the node the object resides on, and the message handler 
is run. ° 



5. If the object in question is marked immovable, then the message is sent back to 
the start of the queue, otherwise the message handler decides whether the object is 
mutable or not, and acts depending. 

* li it T^» m T 1 ^?; ble ' the bindin J s ar * removed from this node, the object is mailed in 
an IMMIGRATED BJECT message back to the requesting node, and the object 
is deleted. 10. j 



^he class/selector late-binding activation model is discussed in detail in chapter 6. 
Since a process cannot be interrupted by a same priority message, it does not suffer from livelock and 
can always make headway. 
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• If the object is read-only, the data is mailed in an IMMIGRATE.COPY message 
back to the requesting node. 

6. These messages eventually arrive back at the requesting node. 

• When a IMMIGRATE.OBJECT message arrives, the message handler (1) allo- 
cates the object, (2) marks the object unmovable (until it can update the birthn- 
ode, to prevent a race condition where hint updates may occur out of sequence) 
(3) copies the data into the object, (4) mails a NOWJtESIDING -AT message to 
the previous node of residence, and (5) calls the RESOURCEjVRRIVED system 
call, which will queue the restart of the waiting contexts. 

• When a IMMIGRATE.COPY message arrives, the handler (1) allocates the ob- 
ject, (2) marks the object header as a copy, (3) binds the old ID to this new ob- 
ject, (4) copies the data into the object, and (5) calls the RESOURCE-ARRIVED 
system call, which will queue the restart of the waiting contexts (copies can be 
collected when storage runs low). 

7. The NOW-RESIDING-AT message makes a hint from the current node to the new 
node, and mails a UPDATE JIRTHNODE message to the birthnode of the object, 
telling it of the object's new location. 

8 ' lnivrT^^S?^ mT)E messa 8 e , makes a Wnt to the new location and mails an 
UUJLL i -MOVABLE message to the location of the new object, passing its ID. 

9. The OBJECT -MOVABLE message marks the object movable. Now the object is free 
to move again. 

Figure 4.4 shows an example of this process. 



4.7 Summary 

The addition of a mechanism for object migration adds much more flexibility to the Jelly- 
bean system. Without imposing policy, the migration and copying system provides the 
basic mechanism for resource sharing. To alleviate name resolution bottlenecks at object 
birthnode, J designed a system of cycle-free hints to indicate where objects currently lie. It 
is not clear how long to allow these chains of hints to be. Long chains of hints would cause 
unnecessary network traffic and increase latency. Having single hints would increase the 
number of birthnode accesses and require mechanisms for removing old links. The system 
currently supports chains of hints. 
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Figure 4.4: Step-by-step Object Migration 



Chapter 5 

A Virtually Addressed Code 
Execution Model 



They shall mount up with wings as eagles; 

they shall run, and not be weary, and 

they shall walk, and not faint 

— The Holy Bible, Isaiah, 40:31 



At the most primitive level, we could execute physically addressed blocks of machine 
code by directly setting the registers, or by sending primitive messages. Unfortunately, 
we have no mechanism to allocate or relocate these blocks of code, they are physically 
addressed and sedentary. This chapter presents the system mechanisms that interact to 
provide a more flexible, but low overhead model for code execution by taking advantage of 
the virtually-addressed, object-based storage model we developed in the last 2 chapters. 

I will present (1) the advantages of an object-based code model, (2) the mechanisms 
for executing object-based code, (3) local caching of methods, (4) contexts, suspension, 
and waiting for resources, and (5) efficient ways of distributing code models across a large 
network. 

40 



CHAPTER 5. A VIRTUALLY ADDRESSED CODE EXECUTION MODEL 



41 



CALL 
Routine 

Address 



Method 
ID 



Optional 
Args 



)fl 



Figure 5.1: Format of the CALL Message 



5.1 Taking Advantage of Object Storage 

By taking advantage of the object storage and naming system we developed, we are able 
to wrap threads of code inside objects and gain all of the benefits of this more powerful 
object- based abstraction, of which a few are: (1) dynamic allocation, (2) relocation, even 
across nodes, and (3) convenient naming and name resolution. This view of code blocks as 
objects (or methods, which is what we call code blocks that are wrapped in objects) allows 
us to consider more advanced calling models, such as the ability to conveniently support 
remote procedure calls (RPCs) and the flexibility to "send the work to the data" rather 
than just the typical mechanism of "bringing the data to the work". 

5.2 An Overview of the CALL Message 



Ignoring for the moment the question of initially creating methods, let's concentrate on the 
mechanisms needed to execute them. The operating system provides a primitive message 
handler for a CALL message. To start a method running, we mail a CALL message to the 
node the method resides on 1 , passing as arguments the virtual ID of the method to execute, 

1 Since we build this on top of the virtual, distributed namespace model, we can use hints to make our 
best guess where method resides. 
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and any data the method expects as parameters. The format of the CALL mesage is shown 
in figure 5.1. When the CALL message arrives at the node it first checks if the method is 
here. If so, the code is started. If not, rather than forward the message to the birthnode, 
we note that 

1. Methods are immutable, and therefore can be copied 

2. Certain methods might tend to be called often from many nodes 

and adopt a policy of copying the method to this node. This way we provide local copies 
on many nodes (these can be periodically purged by some appropriate stategy to free up 
memory). 

Once the method is on the node where the CALL message arrived, the message can 
start up the method. It does that by 

• Translating the ID of the method into its physical address 

• Placing this physical address of the code block in AO 2 

• Placing a 2 in the IP register 

These steps will start the processor executing instructions from the method, starting at the 
third word. We skip the first two words of the method, because these hold object header 
information. The steps of the CALL message are schematically charted in figure 5.2. If 
the method somehow relocates on us while we were executing 3 , the process that relocated 
the object will invalidate the AO register. When our process starts again, it will fetch 
an instruction through AO and cause an invalid address fault. This will run an exception 
handler to retranslate the method ID (in IDO) into the physical address (putting it in AO 
again), and we will continue as if nothing had happened. 



AO always points the the base of the code currently executed, unless the processor is in absolute mode, 
where this value is treated always as 0, regardless what it holds. The IP register holds the relative offset of 
the program counter within this code block starting at AO. (If we are in absolute mode, the IP register acts 
in effect like an absolute address rather than a relative address, because absolute mode makes the processor 

pretend the value of AO is 0.) 

3 This could be caused by heap compaction, or the method being migrated to another node to free up 

space, among other reasons 
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Figure 5.2: Flowchart of the CALL Message Handler 
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5.3 Caching Method Copies 

Since method code is immutable, we can cache methods, just as we can cache other read-only 
data. To request a copy of a method we: 

1. Allocate a context object to hold our processor state, so we can restart later 

2. Copy the processor state into the context 

3. Place the context in the resource wait table indicating that our context is waiting on 
this requested method & 

4. Mail off, requesting a copy of the method 

5. When the method arrives, it is placed on our node and our context is restarted 

These cached copies will have the copy bit set in the object header so that the storage 
reclaimer will know that this cached object is a duplicate, and can be purged if space is 
tight. Let's now look in a bit more detail at contexts and this resource wait table, two 
crucial mechanisms for supporting high level execution control. 

5.4 Contexts 

5.4.1 Why Do We Need Them? 

Contexts are just objects that hold the important state of the processor, so the current task 
cab be halted and later restarted where it left off. In addition, contexts can provide space 
for local variables used in the task's computation. 

5.4.2 How Do We Make Them? 

Contexts are allocated by the NEW-CONTEXT system call. The call takes as an argument, 
the number of additional variables needed, and it returns a context big enough to hold the 
minimum necessary processor state plus the additional variables. When a process is done 
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Figure 5.3: Structure of a Typical Context 
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with a context, it should explicitly deallocate it with the FREE-CONTEXT system call. 
Figure 5.3 shows the format of a typical context. 

As with all objects, the first two words are used by the object manager. The next 
three words are used to hold an offset to the processor state part of the context (for faster 
restarts), a pointer to the next context in a list of contexts, and a value indicating that the 
context is waiting on a particular resource. The context then contains some amount of user 
reserved space follwed by nine words of processor state. The minimal size of a context, with 
no user space is 14 words. 

5.4.3 How Do We Make Them ... Quickly!? 

Since we expect contexts to be used very often, and since we want method startup costs to 
be small and methods to be short, we don't want a majority of our execution time to be 
spent allocating contexts. To accomodate these constraints, we reuse old contexts rather 
than allocating new ones each time. When a context is deallocated, it is placed back on a 
free context list The next time a context is requested, we try to re-use one from the free 
list, since this will take only a few instructions. 

However, contexts vary in size, and we wouldn't want to have to walk the list each 
time to see if we have a context big enough to meet our request. So, we only save contexts 
that meet a common size. This way, any time we request a context of this "common" size, 
we can yank the first one off of the free list and use it. The format of the free context list 
is shown in figure 5.4. 

The first context in the free context list is pointed to by the CONTEXT_FREE_- 
LIST operating system variable. If no contexts are in the free list, the OS variable is set 
to NIL. Each context in the free list points to the next context in the list by the context's 
NEXT .CONTEXT slot as shown previously in figure 5.3. The final context in the free list 
has its NEXT.CONTEXT slot set to NIL. 
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Figure 5.4: The Free Context List 
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5.4.4 Restarting a Context 

The operating system provides one primitive message (RESTART-CONTEXT) and two 
system calls (XFERJD and XFER.ADDR) to restart a context. The system calls take 
either an ID or a physical address of a context, and restarts it, copying the processor state 
from the context to the processor registers. The restart context message takes a context ID 
and transfers control to it by calling the XFERJD system call on the context ID. 

5.5 The Resource Wait Table 

The resource wait table is a system data structure that indicates which contexts are waiting 
for which services. It consists of two parts. The first part of the wait table is a fixed size 
associative table that binds resource IDs to waiting contexts. Figure 5.5 shows a portion of 
a hypothetical table. We see several contexts waiting for ID1, one context waiting for ID2, 
and the rest of the slots are empty. Empty slots are set to NIL. When a resource arrives, 
the wait table is searched, and the contexts in the list bound to the ID are restarted. 

Searching this table is fast, but unfortunately, we can not bound the number of 
entries that try to occupy the table. At some time, we may run out of room. When this 
happens, we resort to a slower form of data structure and link the contexts waiting on 
resources in a list called the resource overflow list. If we don't find a binding in the table, 
we begin searching the list of contexts. Since each context has a RESOURCE-NEEDED 
slot, we can always tell what resource the context is waiting for. This provides us a way to 
continue if the table becomes full. By sizing the table appropriately, it may be possible to 
limit use of the overflow list to a minimum. 
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Figure 5.5: The Resource Wait Table 
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Figure 5.6: The Resource Wait Overflow List 
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Figure 5.7: A Parallel Resource Request Bottleneck in a 3 x 3 Network 



5.6 Removing Method Caching Bottlenecks with Distribu- 
tion Trees 



The current scheme for method caching implies that in many cases, nodes wanting methods 
will have to ask the birthnode of the method (or at least the residence node) for a copy. 
If many nodes simultaneously need the same method (as will likely happen with highly 
parallel execution), then the birthnode will be deluged with method requests which it can 
only handle sequentially. These bottlenecks could degrade performance considerably. For 
example, figure 5.7 shows a network of 9 processing nodes. Suppose nodes 2 - 9 all requested 
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a method copy from node 1. Node 1 would receive a barrage of 8 requests for the method 
which would eliminate all parallelism, since it could consider each request only sequentially. 
One way to reduce the threat of performance degrading bottlenecks is to set up a 
distribution hierarchy, so that each node requests resources from its local distribution center 
(the distribution hierarchies are different for different resources). Each of these local centers 
would make requests to its superior, all the way up to the master resource center. We can 
use this type of distribution graph to help in requesting method copies (or copies of any 
type of immutable data for that matter). 

Take again the 3 x 3 node network example, where 8 nodes request a method from 
node 1, but this time impose a distribution bureaucracy like that shown in the tree in figure 
5.8. This time, node 1 only has to handle 3 messages, from nodes 2, 4 and 5. Each of these 
nodes serve as local distribution centers for the remaining nodes. Node 2 services nodes 3 
and 6, node 4 services nodes 7 and 8, and node 5 services node 9. In this manner we have 
permitted more parallelism to continue, as well as limiting the burden on node 1 (which 
could cause queue overflow, network blocking, and other conditions where performance 
degrades considerably). 

Let's now discuss some ways that a distribution tree method caching scheme can be 
implemented in the Jellybean Machine system software. First, what are the contraints we 
are working under? 

• The distribution tree edges must be easily computable 

• We need to make reasonable choices for branching factor versus tree depth. Too high a 
branching factor might create bottlenecks, but too low a branching factor would tend 
to cache unnecessary copies, and suffer long latency as the birthnode was many edges 
away from the requesting node. 

• We would like to have significantly different trees for different resources. Different 
methods should have different distribution hierarchies, again to decrease bottlenecks, 
and to distribute resources more thoroughly. 

One fairly simple first attempt at a distribution tree formula might be to go to the 
distribution center that is halfway between the current node and the birthnode in terms 
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Figure 5.8: A Distribution Tree Bureaucracy To Balance 



Load in a 3 x 3 Network 
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of hops. In other words, to find the next regional distribution center, given the birthnode 
coordinates (x h ,y b ) and our current coordinates at (x c ,y c ), we would calculate the halfway 



coordinates (xi,yi) by: 



Ax real = 



Xb - x c 
2 



Aureal = *Z* 



/ 



Ax = 



Ay = 



r*reall if s g n *real ^ ° 
. ~ Orealll if sgnx reaJ < 

frreall if s gny r eal ^ ° 

~ nVrealll if sgny reai < 

xi = [x c + Axl 

yi = fa + Ay] 

This is in fact the algorithm used to create the distribution tree in figure 5.8. Figure 5.9 
shows several distribution trees created by this algorithm for networks of various sizes and 
various birthnodes. This method creates trees with depth at most log 2 m + 1 for a network 
with a maximum dimension of m nodes. So, for a reasonable sized machine of 4096 nodes 
(64 x 64) we would at most have to traverse log 2 64 + 1 or 7 edges of the distribution tree. 
For enormous systems, say IK nodes on a side, the tree depth will be only 11. 
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Figure 5.9: Example Distribution Trees for Several 
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Chapter 6 



System Support of a 
Type-Dispatched Calling Model 



We never sent a messenger save with 

the language of his folk, that he 

might make the message clear for them 

— The Koran, 13:11 



One of the most important aims of the Jellybean Machine is to provide a concurrent 
processor that efficiently supports object-oriented, late-binding procedure activations. This 
chapter introduces the idea of message-passing and late-binding programming methodolo- 
gies, and discusses the system services in the Jellybean Machine operating system that 
support this manner of programming. 

6.1 Message- Passing and Object-Oriented Languages 

There has been much interest during the past few years in "object-oriented" programming. 
Though this term is not particularly precise, it does describe a fairly cohesive set of languages 
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exhibiting behavior markedly different from the typical Algol-like programming style. There 
are two characteristics in particular that languages typically categorized as object-oriented 



share. 



First of all, operations tend not to be thought of as functions applied to data objects , 
as they are in Algol derivatives. Instead, data objects are "personified" as "actors" that 
receive requests made of them. These requests are made by "sending a message" to an 
object called the receiver of the message. The operation that was requested of the object 
is typically called the selector, since it selects the object to be performed. So, where a 
standard language Algol-like language might calculate the determinant of a matrix m by 

determinant (m) ; 

and object oriented implementation might look something like 

(send m 'determinant) 

We call this concept of performing operations by sending selectors to objects the message- 
passing paradigm. This paradigm turns out to be a very convenient model of computation. 
The second characteristic of object-oriented languages that make them appealing is 
the fact that the operations on different data-types can have the same names. This allows 
us, for example, to have an 'area selector for circle data types, as well as an 'area selector for 
polygon data types. In many other languages this would cause a naming conflict, requiring 
us to set up an explicit naming convention, such as calling circle_area() and polygon.area() 
routines on objects of the proper type. 

But, more importantly than just saving us the hassle of naming conflicts, object- 
oriented languages actually decide which procedure to run for a certain data type. In other 
words, when an 'area selector arrived at an object, the system would decide whether this 
object is a circle or a polygon and automatically run the correct procedure. In addition, 
if the receiver of the 'area selector was not a data type that supported the area operation 
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(such as an integer), then an error would be reported by the system. In Algol-like languages, 
it is the burden of the programmer to know the type of the object he is dealing with, so he 
can call the proper operation. This is crucial in many symbolic languages with loose type- 
checking, like Lisp, where we can have lists of many different types of objects 1 . This is called 
a late-binding activation since we don't decide what routine will be run at compile-time, 
but instead wait until later, when the message send is actually done. 

Operations with the same name and semantically similar meaning supported by 
various data types are called generic operations since these operations represent the generic 
behavior the programmer wants to accomplish (add things, draw things, calculate areas of 
things). The specific behavior is calculated at run-time once we know the data type of the 
object (called the class of the object), and the selected operation, by a process known as 
class-selector lookup. 

So, object-oriented languages have two main components 

L oKJ^Jm *f ivated b y. the rnessage-passing paradigm rather than a more ap- 
pucative model of programming. 

2 * £*?« i a k a ^ has itS ow V et of su PPorted operations, where names can be the same 
as in other data types and may represent generic operations over varied data types. 
Activations are caused by late-binding sends which lookup the specific operation to run 
based on the class of the object receiving the message (the receiver) and the selected 
operation (the selector). 

Our goal now is to provide a system substrate that will efficiently and conveniently support 
these aims. 



tvow o? ^hiJXTife SL £?♦£" " ob J« ct . onei ' t ed drawing program, where we have a list of many different 
sysufm is tc f^d la vfrV- ^If* 1 "?' P lct ? re ;A convenient way to refresh the screen in an object-oriented 
system is to jend a draw message to each object w the list. Based on the data type of each object at 

run-time, the appropriate routine (circle draw, rectangle draw, text draw, etc.) is activated 
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Figure 6.1: Format of the SEND Message 



6.2 Late-Binding Send Execution Support 



The next task of the operating system is to provide a mechanism to simulate the message- 
passing paxadigm. We already have network communication hardware that anows data to 
be sent between nodes. We a*o have a global object namespace provided by the virtual 
memory extensions. Together, we can use these components to implement the message- 
passing execution model. 

To do this, we impkment one more primitjve messagc> (he S£ND mes ^ ta ^ 
(not «o be confused with the SEND machine instruction). This primitive message handler 
acts ,n the object-oriented manner we showed earlier. Figure 6.1 shoW s the significance „f 
the different words of ,h, message. The firs, word is ,h, address of the SEND message 
handler, ,h, second word is the selector, the third „o,d is the receive, The res. of the 
words are arguments, and information about where to reply to. 

When the SEND message arrives on the node that the receiver resides on (we fc 
ward this SEND mes^ ,„ „ here ver the receiver resides) the primitive message handler is 
started. Figure 6.2 shows a How chart that describes how ,h, SEND message handler works 
It firs, picks the class our of the receiver object (so we know what data type the receiver is) 
We then merge the class and selector together into a class/selector word (shown in figure 
6.3). Now that we have the class and selector, we «ry to sen if there is a class/selector - 
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method ID binding in the cache. If so, we start the method with the CALL message as 
discussed in the previous chapter. If not, we need to lookup the binding. 

At the current time, we do not have enough insight into the characteristics of ma- 
chine behavior, to feel comfortable locking down the class/selector lookup algorithm. For 
this reason, we provide the lookup routine in a method. We insist that this method is allo- 
cated before any others so it always has the same method ID. This LookupMethod method 
takes the class and selector, and consults some distributed system table to find the method 
ID corresponding to this class and selector. 

6.3 Loading Class/Selector Methods into the System 

Let's now briefly look at how the class/selector method information is loaded into the Jelly- 
bean system. Figure 6.4 shows the schema for how the compiler and run-time environment 
will interact with the Jellybean Machine processing network. The compiler is responsible 
for generating class and selector numbers and for compiling the source language into MDP 
machine code. A certain node of the network is picked for the method to reside on by some 
distribution policy. The method data as well as the class and selector that this method 
represents are sent to this chosen node by the NEW-METHOD message. The format of a 
NEW.METHOD message is shown in figure 6.5. 

When a NEW.METHOD message arrives at a node, the NEW-METHOD message 
handler begins executing. It makes an object to hold the method, and copies the code from 
the message into the object. The NEW.METHOD handler then calls the InstallMethod 
method which takes the class, selector, and method ID and makes the bindings in the 
class/selector — method ID data structures. 

Specification of the class/selector -> method ID data structures has been ignored 
without attempts at subtlety. We do not have enough insight to definitely specify the best 
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Figure 6.2: Flowchart of the SEND Message Handler 
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Figure 6.3: Class/Selector Word Format 
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Figure 6.4: A Coarse View of the Compiler/Machine Interface 
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Figure 6.5: Format of the NEW_METHOD Message 



fonnat for these tables. We can talk a bit about the issues involved. (1) We should be 
able to take a class/selector word and efficiently find the corresponding method ID. (2) The 
table should be distributed around the network in a way to minimize bottlenecks. 

A reasonable way of doing this would be to apply some "bit-twiddling" function 
to the class/selector words to decide what node is responsible for knowing their bindings 
The actual data structures could be hashed, or perhaps ea,h class would have an object 
that holds the method IDs for every selector. One annoying problem with any approach 
as the boot-strapping problem. We need to know how we can get to the data. Because of 
the added indirection through the LookupMethod and InstallMethod handlers we have the 
flexibility to try several approaches and test their performance in the future. 

6.4 Returning Values 



Return vames can be sen. „» n the KZ?V{ message . ^ m ^ ^ ^ ^ ^ 

to reply ,o, ,he slot .umber of «be context to ffl, and . word of reply ^ ^ reply 

data is passed by value if it is a primitive data word, or by reference if an object is to be 
returned. 
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6.5 Summary 

The class/selector calling model is a convenient mechanism for invoking tasks. By imple- 
menting it efficiently in the operating system kernel, we can guarantee an efficient implemen- 
tation. To provided extensibility, we provide hooks to the LookupMethod and InsertMethcd 
handlers, so these routines can be reconfigured independently of the rest of the kernel. 



Chapter 7 

Storage Reclamation in the 
Jellybean Machine 



But virtue, as it never will be moved, 

Though lewdness court it in a shape of heaven, 

So lust, though to a radiant angel linked, 

Will sate itself in a celestial bed, 

And prey on garbage 

— Shakespeare, in Hamlet I, V. 53 



7.1 Introduction 

The successful performance of our machine relies on the fact that sufficient parallelism 
exists on the grain of methods. In order for this to happen, it is important that data- 
dependencies to shared objects are minimized, by adopting a more functional approach, 
where methods interact by value rather than by reference, as much as possible. This situa- 
tion promotes a large number of small, short-lived objects. Because of the minute amount 
of memory per each processing node, an efficient storage reclamation mechanism becomes 
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or- 



an important facet. The characteristics of our system, however, cause many straight* 
ward methods of storage management to break down. In this discussion we will examine 
some of the important properties of the Jellybean Machine, and the ways these properties 
influence reclamation. The rest of this chapter provides a discussion of the issues pertaining 
to reclamation on the Jellybean Machine, and a possible first-cut at a garbage collection 
algorithm. 



7.2 Automatic Collection is Desirable 

Because the system is object oriented, and because we have a small memory with frequent 
allocations, object reclamation is important. Because objects can be shared in complex 
ways, and because of the high level programming model we wish to support, we wish most 
object deallocations to be handled automatically by a "garbage collector" that searches for 
objects that are no longer in use (i.e. there are no pointers to the object anywhere) and 
deallocates them when necessary. 

7.3 Choosing a Collection Approach 

Several characteristics of the Jellybean Machine will guide us in the choice of garbage 
collection. Let's remind ourselves of the character of the machine. 

7.3.1 Memory Organization 

The memory in a Jellybean processor is small, and it is local to that processor. Memory 
allocation is done in a simple contiguous manner. Compaction can be done in parallel 
very quickly. Memory objects are segment-based and are given unique object id's. In 
addition, these object id's are concatenated with a birth node number to provide a global 
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virtual address. The virtual to physical translation mechanism uses caching to improve 
name resolution, but this relies on locality. Random access to many addresses could be 
very expensive. 

7.3.2 Addressing System and Network Topology 

The Jellybean Machine uses a distributed memory to provide "site autonomy" [LS80] in 
order to perform local operations very fast, and avoid memory conflicts. But, the tradeoff is 
that foreign accesses will be very costly, involving a message send mechanism that is at least 
an order of magnitude slower. In addition, distributed memory can require synchronization, 
and the delays of network communication may make certain synchronization conditions 
impossible. The network may cause bottlenecks to occur if too many messages are sent to 
one place, and may hold data in transit. The network latency may also be a factor. 

7.3.3 Garbage Collection Character 

Garbage collectors take on various different characters. The common approach of reference 
counting collection doesn't appear to be feasable in the Jellybean Machine because (1) 
it cannot collect cyclic data structures, (2) every pointer change will require a (possibly 
remote) object access, and (3) 'we are not always aware when "dead" pointers get changed. 
For these reasons, we decided to attempt some variant of a pointer chasing garbage collection 
mechanism. The next section describes the implementation of a pointer chasing garbage 
collector for our machine in some detail. 

7.4 A Pointer Chasing Garbage Collector 

There are several properties that we would like our garbage collector to have. 
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7.4.2 Problems 

Synchronization and "Travelling References" 

A major problem in garbage collection across a communication medium is lack of synchro- 
nized, instantaneous transmission. This shows itself in garbage collection in a few ways. 
One of the more annoying problems is how to be sure that the last pointer to an object 
isn't in transit when the garbage collector comes along. The garbage collector doesn't see 
any pointers in the network, so an object may be deleted because a pointer was "travelling" 
between nodes where it can't be noticed. We can refer to this as the travelling reference 
problem. Figure 7.1 shows a portion of a network of processors, where an ID of an object 
is in the network when the collector is run. 

An obvious way to resolve this situation is to prevent all upcoming message sends 
during collection, so that no other pointers are mailed into the network, and then to wait 
until all messages in transit have landed in a queue. We can tell when all messages have 
landed by either waiting a length of time we know to be longer than the maximum latency 
from the most distant nodes, or by sending "scout" or "bulldozer" messages down the 
network dimensions. When all these "bulldozer" messages arrive, they will have pushed all 
other messages out of the way, and the network will be empty. 

Problems With Disabling Sends 

In order to prevent the travelling reference problem, we have to 

• Disable sends so no new references enter the network. 

• Wait for all messages in the message in the network to land. 

But, we have no explicit mechanism in the MDP processing node to disable sends 1 . If we 
did, we could allow the processors to run until they tried to execute one of these disabled 

'Or more preferably - a mechanism that would disable any sends that would cause a reference to be 
mailed into the network - all other messages could continue 
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Figure 7.1: Object ID Travelling in Network 
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instructions. When this happened, a fault could occur and some manner of process halting 
could occur (such as saving a context for the process for later re-starting 2 ). 

A possible way to resolve this problem at first might be to place guards in certain 
high-level execution handlers such as SEND and CALL. These handlers are run when a 
SEND or CALL message (two messages that ask a node to start executing a method) 
arrives. Inside these handlers we could have a guard that would defer the execution of 
the method until collection finishes. This goes a long way toward resolving the problem of 
travelling references if most the code that mails IDs around is code that is executed with 
CALL and SEND 3 

Another way to shut down the machine might be to disable the queue execution. 
This would cause messages to back-up in the queues. Certain messages that we would want 
to execute could be done by having the processor "walking" the queue by hand looking for 
certain types of messages (such as garbage coUection messages). It could also pull items 
out of the queue and into the heap to prevent queue overflow. 

Problems With Background Execution 

Since, at the start of garbage coUection, we stop message sends by various possible mech- 
anisms, our concurrent machine is effectively shut down. This violates our desire for the 
collector to run in the back ground, in parallel with method execution. 

tion.^hV'mighft^^^ ofinsufficieiit memory for a context alloca- 

te standard n^rhlnim^i. »« a^. V n *? e ""a* 11 * ' collection. When there is not enough bcal memory, 
network which^icSv wh»t ° * he , * U ? Catlon ° n » ^ rei ,R?. node ; But this requires maiUng reference?™ he 
neiworx, wine* is exactly what we are trying to avoid. This underscores the difficulty present in providing 

efficient, convenient methods of prevent travelling references 

sysLm'te^l^wW? £ ^ **** \° m ° A ±. L Md SEND meM *« e9 ' *" other mcs »»« es " e P*"*™ 
system messages (where the system may have to be responsible for avoiding ID mailing during coUection) 

c?t SEND »tenTrf^tio „ C , r ^K NEW K 0bJTCts , aad *>-<* faction return,. If we S of a CALL 
Processor bei^« idf/or SSSE'.f 111, th ? n ^ guald ^ h ? d ^ wwtMlhr »»°P the machine, with every 
we m^ ,lwTv. K ? g /^ »f UtC * funct,on - ThM implementation Eas at least 2 requirements that 

violaTth^r YT / (1) ^ mUSt ,MUre that *" non - c ALL and non-SEND messages must not 
violate the rules and mail references during garbage collection time. (2) Catastrophe can occur when we run 

out of memory trying to make contexts to hold the deferred execution requests. 
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In addition, the lad. of a register set for background mode prevents any way for tie 
Message Driven Processor to take advantage of idle time in a reasonable way. Since any 
message would take priority over background mode, the register se, will be trashed Any 
computation done in background mode must shut off interrupts, „ Uch , Dstead of (aJdng 
advantage of idle time, takes advantage of application execution time! Some compromises 
can be made, suck as baviug background mode star, up small unit, of computation by send- 
■ng prmrity messages, o, by queuing up context, of waiting-*,™ baclgrouml proccsses 
that are begun by a context startup message send when the background loop is entered. 
Again, various improvements should be examined. 

7.5 Summary 

The characteristics of the Jellybean machine necessitate a heap collector to recWm storage 
Tms collector may have to run often (since our nodes have such a small amount of memory) 
A reference counting approach seem, to be out since there j. a large overhead in changing 
■he object reference count, (and i, i, difficult to know when a reference is written over 
and thus deleted) as weU as the fact that i, cannot handle cyclic stmctur* (if we insist 
that cyclic structures are illegal that result, in a hlg loa. in term, of flexibility. If we don't 
collect structure,, ., will rapidly run on. of memory). A pointer chasing coflecto, ha, 
problems with ,r* m ,U M „ fm ^ s (where „. ^ ^ ^ ^ ^ ^ ^^ ^ 

an oh*, became i, 1, „ a network - and thu. delete the object,, hut .eem, to be the 
most viable approaci. I, would be de.irable to have the coUector run in the background 
wfthon. shutting the machine down, but the travelling reference problem seem, ,„ make 
this difficult. 



Chapter 8 



Support for Concurrent 
Programming Languages 

/ get by with a little help from my friends. 
— John Lennon and Paul McCartney, in "A Little Help From My Friends" (1967) 



The Jellybean Machine Operating System Software provides several noteworthy 
services to support concurrent programming languages, both for functional and efficiency 
reasons. These include (1) the SEND and REPLY message handlers, (2) futures, (3) dis- 
tributed objects, and (4) the interaction interface. 

8.1 High- Level Languages 

8.1.1 CST 

Currently, the high-level language being used in the Jellybean Machine project is a Smalltalk- 
80 based language called CST (Concurrent SmallTalk) [DC]. CST uses a Lisp-like pre- 
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fix syntax, and code, send, implici.1, in a function application metaphor. CST allows 
asynchronous messages to exploit concurrency, and My utilises the late-binding execution 
model. Locks are provided for explicit synchronization, and a "distributed object" data 
type exists to scatter object state ove, a large area. This CST code will be compiled to 
intermediate code winch will is passed through a bank end that convert, the i-code to MDP 
machine code and loads i, into the system. The compilation and loaning mechanism is was 
previously sketched in figure 6.4. 

The rest of this chapter describes several operating system services that support the 
execution of the object-oriented model of computation. 

8.2 SEND and REPLY 

As discussed in earlier chapters, .he SEND message handler provides the machinery to run 
a method based on the class of a receiving object and the selector symbol "sen." to the 
object. In the current system, the SEND message may also describe one object to return a 
value to. This re,» m -s<o t i8 spe^ by passing the ID of the object to hold the returned 
value (the returned value must be one word, either a primitive value such a, an integer or 
a symbol, or the ID pointer to the object), the *, (index into the object) number, and the 
node the object is on. 

The REPLY handler actually performs the return of the value. The REPLY message 
mails the targe, object ID, the target variable number, and the one word return v*ue to the 
node number specified in the SEND message. When a REPLY message arrive, a. a node 
the returned value is stored in the indicated slot of the target object, and any processes 
waring for a variable to be filled by a reply are restarted. 



CHAPTER 8. SUPPORT FOR CONCURRENT PROGRAMMING LANGUAGES 75 
8.3 Futures 

8.3.1 Conforming to Data Dependencies 



in a 



Data dependencies impose an order on execution. If a computation result is used 
calculation, the result must be available before the calculation can occur. In a sequential 
processor, there is no problem. The instructions are ordered in such a way to insure that 
previous results are available in certain places before those values are needed. In a dis- 
tributed processor, on the other hand, a computation may take an indeterminate amount 
of time to complete on a remote node. Because of this, we may get to a point where a value 
is needed before the calculation of the value has completed. It is necessary to wait until 
this result returns before continuing the calculation. 

8.3.2 The Check's in the Mail 

This section details a mechanism used prominently by the Jellybean Machine to impose data 
dependency orderings conveniently. The mechanism is quite simple. Whenever a calculation 
is spawned off in parallel, the destination location where the value of the calculation is to 
be stored is filled with a specially tagged value, called a context future, indicating that the 
value will arrive to the context in the future. When the calculation replies with the value, 
the future is overwritten with the real value of the computation. 

When an access is made to a location in a context, using the value located there, 
there is the possibility that the value hasn't replied yet. We can tell if the value hasn't 
returned yet, because it will be filled with a context future (c-future) if it hasn't. Any read 
of a location containing a c-future will cause the processor to fault, (1) saving the processor 
state in the context object and (2) marking the context as waiting for a c-future. When a 
reply arrives to a context, the context is checked to see if it is waiting on a c-future. If so, 
it is queued to be restarted. 
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Advantages 


Disadvantages 


Simple 

Transparent 

Minimal Synchronization 


Large Inertia 
Parallelism Wasted 
False Restarts 



Table 8.1: Pros and Cons of Dependency Enforcement by Futures 



Let's examine this context-future mechanism in a bit more detail to see what it 
really provides us and what deficiencies it faces. Table 8.1 itemizes some of the advantages 
and disadvantages of the future mechanism. 



8.3.3 Advantages 

As we said earlier, the most desirable characteristics of the c-future approach is that it is 
simple to implement and understand. It fits well into the existing system, being "opti- 
mistic" — taking advantage of the fault mechanism and the tagged architecture and using 
contexts. 

Being transparent to the programmer/compiler writer is desirable as well. No 
burden is placed on the code generator to explicitly keep track of non-completed tasks. 
No extra instructions need to be placed in-line to check for the presence of values, or to 
manipulate semaphores. 

Finally, the future approach only pays the price of synchronization if it is neces- 
sary. If a value returns before it is needed, or if an arm of a conditional is never executed, 
we will not need to pay the synchronization price 1 . 

'Though we do require all replies to be in before we deallocate a context, so we can re-use context IDs. 
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8.3.4 Disadvantages 

On the other hand there are several disadvantages to this approach. The system is subject 
to high inertia. The total cost of halting and saving a context and restarting it when 
the return value arrives is relatively high. The worst case occurs when we have many 
dependencies following one after another. Here, we would keep halting and restarting, 
making very little progress. It can be difficult to gain any momentum, because of the time 
spent saving and restarting contexts. This case isn't quite so bad if we have other tasks 
queued up that can take advantage of the free time, and if the replies take a while to 
arrive (which is likely to be the normal case). The real question is one of balance between 
computation time and system overhead time. 

By controlling execution on the grain size of methods, whenever a sequential exe- 
cution encounters a c-future value, the entire method will be suspended. Thus once we hit 
a c-future value, other possibly executable code in the method is not run. This is directly 
the result of basing the grain of parallelism on the unit of methods, and it has the effect or 
wasting parallelism as opposed to a more fine-grain execution model. 

C- futures also can lead to a problem of false restarts where a reply for a different 
slot would restart the context, which would immediately halt on the same c-future again. 
If we were waiting on variable A to return and a reply to fill variable B arrives, the context 
would be restarted falsely, and when we read A we will hit the same future and halt again. 
This is rectified in the prototype implementation, by using the RESOURCE-NEEDED slot 
of the context to hold the slot number the context need to be filled. When a REPLY arrives, 
the context is only restarted if it was waiting on the slot the REPLY came to fill. 
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8.4 Distributed Objects 

A final system characteristic designed to support efficient high-level language execution is 
the introduction of distributed objects. A distributed object is one where its state is broken 
up into segments called constituent objects, and scatterred across the processing network. 
Its purpose is to allow parallel access to different parts of an object. 

A single object can only be directly accessed by the node it resides on, and the node 
it resides on can only run one task, implying that an object can only be computed on by 
one task at a time. In the absence of coherent caching strategies, this one-object — one-task 
constraint can potentially severely limit parallelism. 

By distributing parts of the object over several nodes we can provide some extra 
(albeit limited) concurrency. The hope is that this increase of concurrency along with the 
fact that an object-oriented programming model should provide access to many distinct 
objects being computed on at once will prevent object bottlenecks from becoming a serious 
performance hindrance. 

The system supports distributed objects by providing (1) allocation and (2) con- 
stituent lookup services. When a distributed object is allocated, the system creates con- 
stituent objects and scatters them in a reasonable way around the network. Each constituent 
object has a normal object ID number which is unique for each CO, and a distributed ID or 
DID which is the same for all constituents of a distributed object. This DID contains the 
information necessary to locate any constituent object. 

8.4.1 A Distributed ID Format 

Figure 8.1 shows a possible format for a distributed ID. The DID knows the number of 
constituent objects, the hometown node of the first object, and a node-unique serial num- 
ber. This prototype DID format places a limit of 256 COs per distributed object and 256 
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8-Bits 



8-Bits 




TAG 


NUMBER 

OF 

CONSTITUENT 

OBJECTS 


HOMETOWN-NODE 
("ROOT") 


SERIAL 
NUMBER 



Figure 8.1: Distributed ID Format 



distributed objects per node. 

8.4.2 Dealing out the Constituent Objects 

When a distributed object is allocated, we want to have a function that maps each con- 
stituent object to a node number. This function should have several properties. It should 
be (1) easy to compute, it should (2) scatter objects in an acceptable manner. 

The goal of distribution is to provide concurrency, so with this aim as the measure of 
success, any distribution scheme would be equivalent. But, we need to take into account how 
the processor load is distributed around the network as well. There are two dichotomous 
goals of constituent distribution, (1) to scatter the objects uniformly across the network so 
there are no hotspots and (2) to scatter the objects locally to prevent long distance network 
traffic. 

Dispersion or Locality? 



These seemingly contradictory aims argue against each other. If we scatter objects uni- 
formly, especially if there are very few objects, the data may lie very far away from the 
majority of the computation. Even though some of the computation will migrate near the 
data and spawn from there, there still many be a great deal of network traffic caused by 
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stride = 



nodes I 



. constituents} 
node n = (birthnode + n x stride) mod nodes 



Figure 8.2: Distribution of Constituent Objects 



the processes still proceeding from the root of the computation. In time, migration of work 
may balance the load appropriately, but we still have worries about uniform distribution. 
On the other hand, if we clump the constituent objects close together, the computa- 
tion will cluster around the data, and not hinder the performance of the rest of the network 
via long distance traffic, but this local hotspot may overwhelm the computational resources 
of this local area of processors. 

A Simple Dispersal Approach 

The first design of the distributed object system leaves this question for further study, 
and adopts a simple, relatively disperse manner of dealing our constituent objects. We 
adopt a simple uniform distribution strategy hoping that the load balancing mechanisms 
incorporated into the system will work effectively. To insure the efficiency of the calculation 
of the function, we use the simple distribution algorithm shown in figure 8.2. The node 
numbers we describe are a finite interval of numbers {n^:0<„< nodes) we ^ call 
ordinal node numbers and not the system network address node numbers which encodes the 
total addressing space of the network. The conversion between the two formats is simple. 
Figure 8.3 shows some sample distributions for various sized networks, birthnodes, and 
constituent object counts. 
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Figure 8.3: Constituent Object Distribution Examples 



CHAPTER 8. SUPPORT FOR CONCURRENT PROGRAMMING LANGUAGES 82 



/ = ^ currentnod^-birthnode j x stride + birthnode 
= ^ currentnode-bkthnode+stride j x stride + birthnode 

if / < birthnode then / = /—nodes mod constituents 

if r < birthnode then r — r— nodes mod constituents 

n = min(hops(currentnode,/), hops(currentnode,r)) 



Figure 8.4: Equations for Choosing a Nearby Constituent Object 



8.4.3 Choosing a Constituent Object 



We now have a first attempt mechanism to assign node numbers to each constituent object. 
Given a constituent object, we can find the node of its residence. For simplicity, we prevent 
constituent objects from being migrated. Now, we want to provide an algorithm to choose a 
constituent object given a DID. We could do this randomly, but in order to take advantage 
of locality, we want to choose a constituent object that is reasonably close to the current 
node. We do this by finding the ordinal node numbers of the constituent objects on either 
side of the current node number (/ and r for left and right) and choose the one (n) with the 
minimum distance in x-y hops. We have to be careful about "wraparound". The algorithm 
is described in figure 8.4. 



Chapter 9 



Issues From a Prototype System 



Keep thy heart with all diligence; 
for out of it are the issues of life 

— The Holy Bible, Proverbs 4:23 



This chapter discusses in some detail, relevant issues that occurred in the design and 
implementation of a prototype operating system. The following topics will be discussed 

• The sizing of the BRAT 

• How to handle a full translation table 

• The scarcity of virtual names 

• Out of memory problems 

• Queue size 

• Queues, stacks, and saving processor state 

These situations are troubling enough to require discussion. The actual prototype imple- 
mentation can be found in an appendix at the end of the thesis. Specifications of the system 
calls and message handlers can also be found in the appendices. 
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9.1 Sizing the BRAT 

To support the global virtual namespace, we use the Birth/Residence Address Table to 
hold the necessary translation bindings. This serves a purpose similar to a page table in 
a multi-level paged memory system, or a segment table in a segment addressable memory 
system. The BRAT needs to hold at least 

1. virtual -»• physical mappings for objects residing on this node 

2. virtual -*• node number links for objects that were born on this node, but now reside 
elsewhere 

9.1.1 Memory Limitation 

But, due to the small amount of memory on each chip, we face a severe restriction on 
the number of bindings that can be stored. Reserving room for system data structures, 
operating system variables, and the heap, we are left with a paltry amount of memory for 
the BRAT. This will directly limit the amount of objects creatable on a node. We must 
make a careful compromise between heap size and translation table entries. We must also be 
able to purge entries from the table when objects are deleted, stressing an efficient storage 
reclamation strategy. 

9.1.2 BRAT Use Scenarios 

Let's take a look at a few possible scenarios that can occur with object management. 

1. There is room left in the heap and the BRAT for more objects to be allocated. 

2. There is room left in the BRAT but no more room left in the heap. 

3. The heap contains many small objects that don't take up much room, but fill the 
BRAT, so that no more objects can be created. 

4 ' ££ e «m eap can be nearlv em Pty> but no more objects can be allocated because the 
BRAT is full of entries of migrated objects. 
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The first case is the most desirable one, we wish we could have this happen all the time. 
The second case is undesirable, but will probably happen reasonably often due to the small 
memory space. This can be rectified by exporting objects to other nodes to free up heap 
space. The third and fourth scenarios, however, occur because of lack of translation table 
space due to the presence of large amounts of resident and/or migrated objects. It is these 
two cases that we would like to minimize. 

The prototype system that was developed assumed IK of RAM per node. Of this 
memory, 424 words were reserved for processor and OS data structures. Thus each processor 
is left with only 600 words to be shared between the heap and the translation table. The 
question that appears, is how to partition the BRAT and the heap in a reasnable manner. 

9.1.3 A Prototype Sizing Based On Average Object Size 

We have no measures as to object size in our system, but we might be able to suggest a 
reasonable approximation of, say, 10 words per object 1 . With 2 words of header for each 
object, this would leave 8 words of object space. So, each object would take up 10 words 
of heap space and 2 words of BRAT space, allowing $p = 60 objects. But, we also need to 
reserve room for bindings of objects born on this node, but now residing elsewhere. Let's 
assume that we pick a limit for this, such as the total number of average-size objects that 
could fit in the heap. This would allow us to migrate every object and STILL fill the heap 
with average sized objects. This leaves us with the following equations. 

heapsize + bratsize = freememory 

residentobjects = hea f size 

migratedobjects = residentobjects 

bratsize = 2 (residentobjects + migratedobjects) 



'Though of course this will depend greatly on the type of program being 
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one. 
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=^- heapsize = f x freememory 
=*■ bratsize = 7 x freememory 

With 600 words of free space, this leaves the following parameters. 

heapsize = 428 
bratsize = 172 

In a 4K RAM node, we might expect the following configuration as a reasonable 

heapsize = 2552 
bratsize = 1020 



In the prototype operating system, the BRAT size has been set at 128 words, rather that 
172, for ease of implementation. 

9.2 Running Out of Binding Space 

Sooner or later, with even our best efforts at insightful sizing of the BRAT, we will run 
out of room to make any bindings. There are several conceivable ways of resolving this 
situation. 

1. Throw up your hands and quit. 

2. Forward your allocation request to another node. 

3. Make the BRAT bigger. 

4. "Delegate" some of the bindings in the BRAT to another node. 

5 ' foJlhelr* bm^rf s t0Wn n ° deS ° f S ° me VlTtUal addresses to make <**"* nodes responsible 

The current operating system implements choice 1 for the most part. There is also some 
code to support choice number 2, but this is complicated by the fact that we might not be 
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able to allocate a context (as discussed in an upcoming section). If this mechanism could 
be made to work, it might be acceptable enough, realizing that any system will break when 
the nodes begin to run out of memory. The investment in a proper load- balancing policy 
may alleviate this problem. The operating system also supports the resizing of the BRAT, 
but because of the hashing mechanism currently used (described in an upcoming section) 
arbitrary resizing of the BRAT is difficult to do. 

The delegation of IDs is possible, but requires some thought. We need a way to 
specify which IDs are delegated to which nodes, and this should take significant less storage 
than would be required to actually store the bindings. We could delegate ranges of IDs to 
a node, but this node must have room for the range, and when this new node runs out of 
room, it must also be able to delegate. This is a possibility for the future. The fifth item 
in the list, changing the birthnodes of virtual addresses would be very expensive requiring 
some synchronization, and a large broadcast of messages. But, perhaps this could be done 
during the garbage collection phase, or offline, or at the end of the day as a background job 
(given a suitably large machine). 

9.3 Scarcity of IDs 

As a related issue, given the virtual ID format of 16 bits of birthnode and 16 bits of serial 
number, each node can only generate 65536 IDs. In the current system, it is likely that 
many applications would run through this ID space in a fantastically short amount of time. 
Of course, Ihe time is dependent on the applications that are run, but we can sketch a rough 
estimate for how long we can run before running out of IDs on a node. 

The following calculations assume a 10MHz processing node where the average in- 
struction length is 1.5 cycles long. We assume that the queue is always full of work to be 
done. We assume that each message-spawned task work will be 200 instructions long (far 
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above the likely amount). We finally assume that only 10% of the tasks that come in will 
involve an allocation of an object. 

10 7 cycles 1 instruction 1 task allocations _ allocations 

second 1.5 cycles 200 instructions " task ~* second 

At this rate, a node would run out of IDs in 18 seconds. Though these numbers are 
questionable at best in the absence of actual measurements, it is quite clear that the ID 
space is compeletely inadequate. We have to have a larger virtual ID, say by having 68 bit 
words rather than 36 bit words, but in the meantime it might suffice to (1) borrow bits from 
the node number field or (2) attempting to re-use certain IDs. Borrowing bits would be a 
short time solution, by limiting our prototype machine to a IK machine, we could get a 64 
fold increase in serial numbers, allowing a node to run for 20 minutes with the assumptions 
made above. But, for simplicity's sake, the current implementation has not adopted this 
format. It would be a good idea to do this in the future until we build a machine with 
larger words. 

The second idea is a more interesting research issue. We already reuse context 
IDs by requiring contexts to have received all replies before they are put on the free list. 
This way, the amount of IDs reserved for contexts (probably the most frequently allocated 
object) is significantly cut. There may also be ways of reusing normal object IDs, but a 
space efficient way of noting these reused IDs may be difficult. Here are a few possible ideas 
on how to reuse IDs. 

1. Keep a fixed size table of free IDs. When an object is freed, the ID will be placed in 
the table. When an ID is needed, this free table will first be checked. The biggest 
problem with this approach, is that when the table fills, IDs will not be placed in the 
table and they will be "lost" forever. 

2. Provide a separate routine for allocating "short-lived" objects. These objects would 
take their IDs from a common, fixed-size pool of consecutive IDs whose freeness could 
be signified by a single bit for each ID. For example, we might reserve 256 "short- 
lived' IDs per node. The short-lived IDs' serial numbers might range from to 255 
and the pool could be represented by 8 32 bit words signifying an array of 256 bits, 
where a indicates the ID is in use, and a 1 indicating that it is free. If these objects 
are truly short-lived, and they represent the bulk of ID requests, then this approach 
might greatly extend the lifetime by conserving regular IDs. 
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3. Every now and then, perform an ID "garbage collection and compaction" where all 
IDs are renamed to consecutive IDs in effect compacting the ID space. This involves 
similar issues to the mechanism of changing an ID's hometown node number. It seems 
to be very expensive, but it may be possible to interleave this with the normal garbage 
collection. 

The currently implemented mechanism only reuses context IDs (a fixed amount). No at- 
tempt is currently made to reuse other object's IDs. 



9.4 The Shortage of Memory 

Of course, the scarcity of memory per node will also prove to be a problem. The goal 
is to take advantage of the large collective memory provided by the system (a 4096 node 
J- Machine with 4K memory per node would have 16 megabytes of primary memory). Load 
balancing can be used not only in choosing processors to perform work, but also in choosing 
nodes to allocate memory from. Simple gradient plane approaches [RF87] can be used 
to cool down memory "hot spots". Garbage collection, expanded memory nodes, and the 
sweeping of "dusty" objects to offline storage are all possible solutions to the memory 
shortage problem. 

The current prototype operating system kernel takes two approaches to memory. 
If a message arrives to allocate an object, and there is not enough memory available, the 
message is forwarded to another node. However, if a process has been running for a while 
and the node runs out of memory, the calling message cannot simply be forwarded, since 
some work has already taken place. Instead, the process must have its state saved in a 
context, and room must be made on this node by evicting certain objects. Unfortunately, 
there might not be enough memory to allocate a context. A solution out of this trap is to 
require that there always be one minimal sized context object available for each priority 
level. A check could be made in the CALL and SEND handlers (and any other message 
handlers that could fall into these circumstances) for a free context. 
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9.5 Queue Size 

Queue sizing also proves to be a problem in the system. Since we want to be able to migrate 
objects by message sends, an empty queue must always be big enough to hold every object. 
This means that the queue must be as big as every heap. This is far too costly in terms 
of memory in the IK node prototype, and we have not attempted to make a fix. It would 
always be possible, though admittedly tedious, to send messages in "chunks" that would be 
able to fit in the queues. 

9.6 Suspension and Processor State 

Whenever a process suspends and plan on restarting later, it must be able to save its 
processor state. This normally means its register set, but we must not forget about two 
other forms of processor state, queues and stacks. When we suspend and there is a message 
we want to save in the queue, we copy it out into a heap object and set the message pointer 
to point to the object instead of the queue. Stacks are more of a difficulty to save and 
restore, and we have decided to explicitly prohibit the saving of stack frames. So, the 
operating system is given the task of insuring it will never have to suspend and restart 
with information on the stacks. This was a source of much personal misery during the 
implementation of the OS (though certainly less than there would have been without the 
existance of stacks). 

9.7 Summary 

This chapter has touched on just a few of the difficulties in the design of the Jellybean 
Operating System Software. Some are due to inadequacies in hardware or scale, some are 
due to lack of behavioral measurements, and some due to lack of insight. These will most 









CXAmM* ISSUES FROM A 



Uktly b«com* tiuwmiUar 




s&'^fcfc i*' 1 ** %■-■■■■■ • "■':■■■ 



sfe ' . -. > -• - ' - 

j/ii. » i ■ •■; 5 . . -.-.■■ 



Chapter 10 



Performance Evaluation 



Never promise more than you can perform. 
— "Publilius Syrus", Maxim 528 

This chapter provides a quantitative performance evaluation of several important 
system services. Though the prototype implementation is certainly not optimal in any way, 
it should be a reasonable approximation of an actual working operating system kernel, and 
as such, the numbers presented in the chapter should be useful for the design and tuning 
of the rest of the Jellybean system. In addition, we should be able to see what parts of the 
system need fixing, before the machine is fabricated. 

10.1 The Virtual Binding Tables 

The virtual name manager is composed of five system routines nested in the hierarchy 
shown in figure 10.1. The BRAT itself is composed of a 128 word binding table of 64 2- 
word bindings. Words are entered by a linear probing [Sed83] scheme where a hash function 
determines the first choice for the location of the binding, and a linear search is performed 



92 



CHAPTER 10. PERFORMANCE EVALUATION 



93 



BRAT ENTER 



BRAT_ENTER_NEW 



BRAT_XLATE 




BRAT PEEK 



Figure 10.1: The Hierarchy of the Virtual Name Manager 



from there. This linear search can take a significant amount of time (at least on the scale 
of average task size), so we need (1) an efficient algorithm and (2) a successful hashing 
scheme. The remainder of this section examines the execution time of each BRAT routine 
and presents some very preliminary hashing measurements. 



10.1.1 Instruction Counts 

The BRAT .PEEK system call is the core to all of the virtual name services. It takes a 
key to hash and a data word to match (not necessarily the same, since you might want to 
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look for the first NIL slot where a certain key could be placed, as is done when adding new 
entries). The key is hashed, providing the index into the table, and a linear search with 
wraparound proceeds from here. The cost of this call is between 22 and 540 instructions, 
based on how far the search has to progress. A reasonable cost approximation, C peek , for 
a search that finds the data in the n th slot is 22 + 8 x (n - 1) steps. 

The rest of the BRAT calls utilize this BRAT.PEEK routine. 

• BRAT JCLATE looks up a binding in the BRAT and takes 27 + C k steps to com- 
plete. p 

• BRAT-PURGE searches the BRAT until it finds the first binding of the specified 
word, and removes it from the table. This takes 30 + C peek steps to complete. 

• BRAT-ENTER.NEW adds a new entry to the BRAT without first removing any 
previous bindings. It accomplishes its task in 32 + C k steps. 

• S 16 ™ *^ ^^iyV^ 1 ^' P° tentiaJlv > is the BRAT_ENTER routine. This is 
Dn.™^ 1 ■ lr RJEW > but il first removes a previous binding, requiring another 
BRAT search. This can take as much as 32 + 2 x C peek steps. 

10.1.2 Effectiveness of Linear Probing 

Evidently, the crucial factor in the effectiveness of the BRAT routines is the cost of peeking 
through the BRAT, C peek , which is a linear function of how far away from the expected hash 
spot the value resides. What the average distance in hash steps will be for a typical machine, 
depends greatly on (1) the application that is being run, (2) how storage reclamation is 
handled, (3) and what is done when the BRAT overflows — all issues needing further 
study. Nonetheless, I would like to proceed with an informal, ad hoc analysis, based on 
reasonable estimates and educated guesswork. The rationale is to see if the linear probing 
strategy seems to generally work — by that, meaning that the average number of steps is 
small until the entry is found 1 . 

] It is not obvious that this will so. In fact, it is quite easy to be concerned that this linear rehashing 
approach might actually work itself into a steady state where entries were always very far away from where 
they were supposed to be. 
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The Mowing data was generated by a simulation program called bratsim that takes 
an input pattern of references and simulates their effect on the BRAT. The size and max- 
imum fullness of the BRAT is specifiable. The simulator takes each reference and looks it 
up in the BRAT. 

• If the reference is in the BRAT, it records the number of steps away from where it 
should be. 

• If the reference is not in the BRAT, it is entered as soon as possible after its hashed 
spot. 

• Wh en names S et entered, some may be arbitrarily deleted to maintain a maximum 
full percentage. 

• If the BRAT fills, a random slot will be emptied. 

The reference pattern generator is also based on initial approximations, generating patterns 
possibly likely in applications we envision running. It is currently configured with the 
following parameters: 10% new IDs, 20% context IDs, 35% recent IDs to simulate locality, 
20% less local IDs, and 15% very random IDs to simulate class/selector bindings, method 
IDs and other references following less of a pattern. I would expect this estimate to be 
conservative. 

Based on these estimates, and the reclamation model presented above, we can chart 
how many steps away from the hashed slot particular IDs land when they are entered. For a 
64 word table, this is graphed in figure 10.2. We see an asymptotic function relating BRAT 
space used and the locality of entries to their intended slots. For the 64 row example, the 
system begins to be unmanageable after the BRAT becomes more than 60 - 70% full. 

Figure 10.3 shows the effect of doubling the BRAT size. The trend is still rapidly 
increasing, but the gains we get in terms of object storage may outweigh the extra steps 
involved in lookup. The flatness of the middle portion, from 40 - 60% hints at a desirable 
operating region. 

So, now I would like to suggest educated guesses to the answers to the Mowing two 
questions. 



CHAPTER 10. PERFORMANCE EVALUATION 



96 



o 

03 

"O 
<D 

..C 

I 

E 

2 
u_ 

3 

c 
rt 

v> 

Q 

CC 

LU 

K 
Z 
LU 

<S 
D> 

■_ 

> 
< 




10 20 30 40 50 60 70 80 90 100 
Maximum Percentage of BRAT Space Used (64 Rows) 



Figure 10.2: 64 Row BRAT Enter Distances from Hashed Slot 
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1. How full should we allow the BRAT to get? 

2. How large should the BRAT be? 

In the last few paragraphs, I indicated the severity of the BRAT filling problem. After 70% 
capacity, the BRAT's performance becomes intolerable. For this reason, I suggest that 70% 
capacity should be an absolute maximum for BRAT size, and the normal operating size 
should not usually exceed 50%. I propose this as the answer for question 1. 

Question number 2 can be answered by adapting the analysis presented in the last 
chapter. The new constraint equations become. 

heapsize + totalbratsize = freememory 

residentobjects = hea P size 

migratedobjects = residentobjects 

bratspaceused = 2 (residentobjects + migratedobjects) 

bratspaceused = .7 x totalbratsize 

=> totalbratsize = £[ x freememory 

=^> heapsize = jj x freememory 

With 600 words of free space, this reserves 218 words for the BRAT and 382 words for the 
heap. This will hopefully be a more accurate value, though it is not a power of 2, which 
will complicate the hashing slightly. 

The efficient manipulation of the BRAT is crucial to the success of the Jellybean 
system. Future study is needed to evaluate hashing functions, and perhaps a form of linear 
re-hashing is desired, where the first hash is followed by a subsequent number of other 
hashes instead of a linear search. In addition, once real applications are run, we can get a 
better idea how the system will behave. Likewise, the translation buffer performance needs 
analysis, as this will indicate how often BRAT lookup occurs. 
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10.2 Object Allocation 

A common task of the Jellyban Operating System Software is to allocate objects from the 
heap. This section will examine how costly this operation can be. 

Figure 10.4 describes the nesting of services required to perform the NEW system 
call. The ALLOC routine takes 24 instructions, it takes 19 instructions to generate a new 
ID and it takes 32 + C peek instructions to enter a new ID into the BRAT. With 20 cycles 
for inter-module glue, the NEW system call takes 95 + C peek instructions. According to 
the BRAT analysis results, if we operate at less than 70% full, we will have to take less 
than 10 steps to enter a new ID, this would indicate that C peek = 94 steps and therefore, 
NEW should take 95 + 94 = 189 instructions. At best, with steps to search, the NEW 
call would take 117 steps. 



10.3 Context Allocation 

Another commonly executed routine is the NEW.CONTEXT system call. As described in 
chapter 5, this service was expected to be expensive enough to merit special treatment. The 
context free list was developed to provide a pool of pre-allocated contexts for fast context 
allocation. The flowchart in figure 10.5 shows the steps taken by routine. Note that if the 
requested context is of an abnormal size, or if there are no pre-allocated contexts on the 
free list, the NEW routine is called to allocate a new object. Requesting an abnormally 
sized context takes 25 + C new instructions, allocating a context when node are on the free 
list takes 27 + C ne w instructions, but allocating a context off the free list takes only 20. If 
we can keep contexts in the pool, we will do well. 

Freeing contexts is also fast, taking only 25 instructions. This is only about 10% 
of the time it used to take to perform this operation, when we were required to purge the 
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NEW 



GENID 



BRAT_ENTER_NEW 



BRAT_PEEK 



Figure 10.4: Nesting of Services for the NEW System Call 
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Figure 10.5: Flowchart for the NEW.CONTEXT System Call 
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old context ID, generate a new one, and place the new ID in the context and BRAT. By 
preventing late replies to contexts, we have prevented this performance loss. 

10.4 Boot Code and Message Handlers 

Let's conclude the chapter with a brief discussion of the complexity of the Bootstrap code 
and several message handlers. The boot code is run when each processor is powered up, 
and places the processor in a runnable state. All together, it takes 5005 steps to boot the 
processor. This is made up of 4103 steps to erase the memory, 481 steps to initialize the 
context free list with 3 contexts, 247 steps to fill the exception vector table, 86 steps to fill 
the extended call table and 72 steps to set up the stacks, queues and other values. 

The WRITE message handler takes 8 + 7 X / + 3 steps to send / words of data. The 
READ message handler takes 8 steps to read an empty message, or 7 + 5 x (/ - 1) steps to 
read a block of data of length /. 

The CALL message handler can exhibit several possible times. If the method being 
CALLed is local, it only takes 6 instructions to start it executing. If the method is local, 
but not in the cache, it takes 64 + C peek steps, because the XLATE exception handler 
takes 58 + Cp eek steps to complete. If the method is not local, message sends are involved 
making it more difficult to analyze. 

10.5 ROM Size 

Out of the 1024 words reserved for ROM, the operating system prototype uses 760. 



CHAPTER 10. PERFORMANCE EVALUATION 103 

10.6 Summary 

This section presented a brief performance evaluation of several important parts of the 
Jellybean system. In addition to analyzing the cost of routines, several more fundamental 
issues were noticed. These are itemized below. 

• The BRAT needs to be searched efficiently. The linear probing method used can take 
a significantly long time if values get placed far from their intended position. 

• D^f^ on P reliminar y simulation, the performance becomes unacceptable when the 
™f' s to 60 to 70 percent full. We can choose a maximum fullness, and derive 
the BRAT and heap sizes based on the fullness value and the expected size of objects. 

• We note that even with an insightful configuration of the BRAT, a translation cache 
is required. The configuration of the cache is left to further study. 

• Creating a new object is more expensive than we would like (a minimum of 117 instruc- 
tions). This could be optimized with clever coding, but not much more performance 
could be gained by this manner. The problem is more fundamental resting on the 
performance of the cache and the BRAT lookup. 

• The caching of free contexts seems to work well. Creating a new context requires 
only 20 instructions if there is a context on the free list (and assuming we don't get 
a translation fault). This is compared to a minimum of 144 instructions without a 
context on the free list. Freeing a context is also fast, only 25 instructions. 

• Calling a local method takes only 6 instructions if the method is local and its trans- 
lation is in the cache! If it is not in the cache, performance again suffers, requiring a 
minimum of 86 instructions. ° 

Table 10.1 summarizes some of the more important performance statistics presented in this 
chapter. 
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Routine 



BRAT.PEEK 

BRATJCLATE 

BRAT.PURGE 

BRAT_ENTER_NEW 

BRAT.ENTER 

ALLOC 

GENID 

NEW 

NEW.CONTEXT 

FREE.CONTEXT 
CALL.MSG 



Instruction Count 


Notes 


C peek = 22 + 8x(n- 


-1) 


n = slots to search 


27 + Seek 






30 + Seek 






32 + Seek 






32 + 2xC peek 




maximum 


24 







19 

95 + Seek 
20 

27 + Seek 
25 

6 

64 + Seek 



with context on free list 
no context on free list 

with method ID in cache 
method ID not in cache 



Table 10.1: Timings for Common System Services 
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Conclusions 



All's well that ends well 

Shakespeare, in All's Well That Ends Well IV 

There is a time for many words, 
and there is also a time for sleep. 

— Homer, in The Iliad, XI 



11.1 Summary 

The Jellybean Operating System Software is a prototype operating system kernel for the 
Jellybean Machine. Its duties include object-based storage allocation, virtual distributed 
naming, object migration, process definition and control, local and remote process execu- 
tion, and the support of an object-orient calling model. 

This thesis described the JOSS in some detail, its successes and weaknesses. The 
report also talks about issues in the future Jellybean operating system that were not imple- 
mented in the prototype because of lack of support, study and time. These include storage 
reclamation, resource distribution bureacracies, and distributed objects. These will most 
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likely become important parts of the Jellybean operating environment in the future. 

Several deficiencies may exist in the current system. Performance-wise, searching 
the translation table may well be too slow. Several solutions can be proposed including (1) 
increasing the size of the BRAT and decreasing the fullness, (2) experimenting with various 
hashing functions and (3) providing an effective translation buffer. Memory shortages may 
provided a significant problem, and this will place an extra burden on reclamation attempts, 
which are already made difficult because of the problem of travelling references. 

On the other hand, if the cache works well, and if the BRAT is not very full, the 
whole system seems to perform admirally. Method invocations are powerful but fast. The 
context free list allows rapid creation and reuse of contexts. The global naming system and 
migration provides a high degree of flexibility. 

11.2 Suggestions for Further Study 

This thesis scratched the surface of many interesting research issues, many of which I for 
one would be eager to investigate. 

In the area of performance evaluation, the configuration and simulation the transla- 
tion buffer and BRAT in a real life environment is important to the success of the Jellybean 
Machine. Also of practical as well as theoretical interest would be the study and evaluation 
of distribution hierarchies and the various manifestations of how to handle virtual hints. 

Reclamation is an important potential area of research. An efficient mechanism to 
collect garbage over a distributed network would be of general interest as well, especially if 
some incremental form of collection can be developed. Policies for handling out of memory 
conditions on processing nodes is also attractive, involving selective migration of objects. 

Finally, load and resource balancing policies need to be investigated, especially since 
each processor can quickly become overwhelmed (being limited in power and memory ca- 
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pacity). Simple gradient plane approaches might be attempted where load spreads to where 
it is lower. Network analysis will also be an important factor. 

11.3 Hopes 

The Jellybean Machine has the potential of being an important step in the development of 
multicomputer networks. It is my hope that further study will be encouraged so that the 
difficulties of machines of this genre can be resolved (memory shortages, expensive name 
translation, no caching of mutable objects, need for resource balancing, etc.) and they can 
show their benefits as scalable, programmable processors. 
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• •• Designed and Implemented by the members of the Concurrent >s« 
•>< VLSI Architecture Group at the Massachusetts Institute of ><• 
><> Technology. m 
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<«• Copyright (C) 1986, 1987 Massachusetts Institute of Technology »«• 

• «« ALL RIGHTS RESERVED «> 
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<<> No copy of this source coda nay be made by any means, <«« 
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><> the Massachusetts Institute of Technology. • «» 
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OS. MOP 

This file contains operating system labels 4 stuff 



liiisliistiiiliiiili 

Useful system values 
ililiiilllllilitiiil 



LABEL 


SYSJ.EN BITS 


LABEL 


SYS_LEN MASK 


LABEL 


SYS_I0 NODE BITS 


LABEL 


SYS 10 ID BITS 


LABEL 


SYS ID ID MASK 


LABEL 


SYS_I0_NOC€ MASK 


LABEL 


SYS_CLASS MASK 


LABEL 


SYS CLASS BITS 


LABEL 


SYS_SELECTOR MASK 


LABEL 


SYS_SELECTOR_BITS 


LABEL 


SYS opo BITS 


LABEL 


SYS 0P1 _ BITS 


LABEL 


SYS 0P2~BITS 


LABEL 


SYS OPO MASK 


LABEL 


SYS UNCHECKED 


LABEL 


SYS UNC 


LABEL 


SYS AOSHAOOW 


LA8EL 


SYS ABS 


LABEL 


SYS INVADR 


LABEL 


SYS MARK MASK 


LABEL 


SYS COPY MASK 


LABEL 


SYS REL MASK 


LABEL 


SYS_UNMOVABLE_MASK 




•llltlttltl 




XLATE Modes 




• ••ilium 


LABEL 


XLATE.OSJ 


LABEL 


XLATE_I0 TO NODE 


LABEL 


XLATE METHOO 


LABEL 


XLATE LOCAL 



10 

X1 1 1 1 1 1 1 1 1 1 

16 

16 

X1111111111111111 

X1111111111111111 

X1111111111111111 

16 

X1111111111111111 

16 

7 

2 

2 

X1 1 1 11 1 1 

(1«31) 

SYS UNCHECKEO 

(1«8) 

SYS AOSHAOOW 

(1«30) 

(1«31) 

(1«30) 

(1«31) 

(1«29) 
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Temporary locations 

■ > Mil 



LABEL 
LABEL 
LABEL 
LABEL 
LABEL 
LABEL 
LABEL 



TEMPO 
TEMPI 
TEMP2 
TEMP3 
TEMP* 
TEMPS 
TEMP6 

ii ill 

Memory Map 

•Sllllllll 



LABEL OS_PO TEMPSJASE 

LABEL OS_P0_TEMPS LENGTH 

LABEL OS P1 TEMPS BASE 

LABEL 0S_P1 TEMPS"LENGTH 

LABEL OS EVECTORS~BASE 

LABEL OS_EVECTORS"lENGTH 

LABEL OS_P0 STACK BASE 

LABEL OS_P0_STACK"lENGTH 

LABEL OS P1 STACK BASE 

LABEL OS_P1 STACK~L£NGTH 

LABEL OSJXIEUEO BASE 

LABEL OS QUEUEO MASK 

LABEL OS CACHE BASE 

LABEL OS CACHE~MASK 

LABEL 0S_QUEUET BASE 





8 

8 

8 

16 

48 

64 

32 

96 

32 

128 

127 

256 

63 

320 



LABEL 


0S_QUEUE1_MASK 


31 


LABEL 


OS VARS BASE 


352 


LABEL 


OS VARS.LENGTH 


16 


LABEL 


OS MCACHE.BASE 


376 


LABEL 


OS~MCACHE_LENGTH 


32 


LABEL 


OS XVECTORS.BASE 


• 408 


LABEL 


OS~XVECTORS_LENGTH 


16 


LABEL 


os"locked_base 


424 


LABEL 


OS LOCKED.LENGTH 





LABEL 


OS~INITIAL_BRAT_LENGTH 


128 


LABEL 


OSllNITIAL_BRAT_MASK 

■ iiiiimiiiiitiiiiiuiii 

Locations of OS Variables 
aaaaaaaaaaaaaaaaaaaaaaaaa 


(OS_INITIAL_BRAT_ 


LABEL 


VAR FREETOP 


OS.VARS BASE ♦ 


LABEL 


VAR_8RAT BASE 


OS.VARS BASE ♦ 1 


LABEL 


VAR BRAT LENGTH 


OS VARS BASE + 2 


LABEL 


VAR_BRAT HASH MASK 


OS VARS BASE * 3 


LABEL 


VAR ROM START" 


OS VARS BASE * 4 


LABEL 


VAR NEXT ID 


OS.VARS BASE ♦ 5 


LABEL 


VAR_LAST - ID 


OS.VARS BASE ♦ 6 


LABEL 


VAR_MCACHE_BASE 


OS_VARS_BASE ♦ 7 


LABEL 


VAR_MCACHE_LENGTH 


• OS VARS BASE * 8 


LABEL 


VAR_MCACHE OVERFLOW LIST 


OS VARS BASE ♦ 9 


LABEL 


VAR_CFREE_LIST 


OS_VARS_BASE ♦ 10 


LABEL 


VAR.HEAP BASE 


OS VARS BASE * 11 


LABEL 


VAR_NET_VIOTH 


OS VARS BASE ♦ 12 


LABEL 


VAR_NET_HEIGHT 

aaaaaaaaaa 

Tag Values 
■aaaaaaaaa 


OS_VARS_BASE + 13 


LA8EL 


TAG.SYM 





LABEL 


TAG INT 


1 


LABEL 


TAG.BOOL 


2 


LABEL 


TAG.AOOR 


3 


LABEL 


TAG IP 


4 


LABEL 


TAG_MSG 


5 


LABEL 


TAG A ■ 


6 


LABEL 


tagIb 


7 


LABEL 


TAG C 


8 


LABEL 


TAG 


9 


LABEL 


TAG E 


10 


LABEL 


TAG F . 


11 


LABEL 


TAG CS 


TAG 


LABEL 


TAG.OBJHEAD 


TAG E 


LABEL 


TAG OBJIO 


TAG~F 


LABEL 


TAG.INSTO 


12 ~ 


LABEL 


TAG INST1 


13 


LABEL 


TAG INST2 


14 


LABEL 


TAG_INST3 

Exception Vector Locations 
laaaiaaaaataaaaaaaaaaaaaaa 


15 


LABEL 


EVECTORBASE 


OS_EVECTORS_BASE 


LABEL 


FAULT BKGO 


EVECTORBASE 


LABEL 


FAULT_OBLFAULT 


EVECTORBASE ♦ 1 


LABEL 


FAOLT_ILGINST 


EVECTORBASE ♦ 2 


LABEL 


FAULT_ILGAORK) 


EVECTORBASE ♦ 3 


LABEL 


FAULT_ACCESS 


EVECTORBASE + 4 


LABEL 


FAULT_EARLY 


EVECTORBASE ♦ S 


LABEL 


FAULT LIMIT 


EVECTORBASE ♦ 6 


LABEL 


FAULT INVAOR 


EVECTORBASE * 7 


LABEL 


FAULT MSG 


EVECTORBASE * 8 


LABEL 


FAULT.QUEUE 


EVECTORBASE + 9 


LABEL 


FAULT SEND 


EVECTORBASE ♦ 10 


LABEL 


FAULT XLATE . 


EVECTORBASE ♦ 11 


LABEL 


FAULT.RANGE 


EVECTORBASE * 12 


LABEL 


FAULT PUSH 


EVECTORBASE ♦ 13 


LABEL 


FAULT POF _ 


EVECTORBASE + 14 


LABEL 


FAULT OVERFLOW 


EVECTORBASE ♦ 16 


LABEL 


FAULT TYPE . 


EVECTORBASE ♦ 17 


LABEL 


FAULT IA 


EVECTORBASE ♦ 18 


LABEL 


FAULT IB 


EVECTORBASE * 19 


LABEL 


FAULT IC 


EVECTORE «E ♦ 20 


LABEL 


FAULT ID 


EVECTORBASE + 21 


LABEL 


FAULT_IE 


EVECTORBASE ♦ 22 


LABEL 


FAULT.IF 

aaaaaaa 

Classes 
aaaaaaa 


EVECTORBASE * 23 


LABEL 


CLASS_CONTEXT 


1 



LABEL CLASS METHOO 
LABEL CLASSJCSSAGE 
LABEL CLASS_INT 

System Call Values 



2 
3 
512 



LABEL TRAP NEW CONTEXT 

LABEL TRAP FREE CONTEXT 

LABEL TRAP XFER _ IO 

LABEL TRAP XFER~AODR 

LABEL TRAP 10 TO NODE 

LABEL TRAP~NEW 

LABEL TRAP _ MALLOC 

LABEL TRAP"GENID 

LABEL TRAP VERSION 

LABEL TRAP~BRAT PEEK 

LABEL TRAP SWEEP 

LABEL TRAP_FREE_SPECIFIE0_CONTEXT 

LABEL TRAP XCALL 
LABEL TRAPJ)IE 

««t»sttttf ***«*«><<* 

Extended Call Values 
■iiiiiiiiiiiiiiiiiu 



LABEL XCALL BRAT ENTER 

LABEL XCALL_BRAT~XLATE 

LABEL XCALL_BRAT~PURGE 

LABEL XCALL_MIGRATE OBJECT 

LABEL XCALL_BRAT_ENfER_NEW 
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Object Field Offsets 

>»««<•<!>!• tUSIIOS 





1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

14 
15 



LABEL OBJECT HOR 

LABEL OBJECOO 

LABEL CONT_PSTATE OFFSET 

LABEL CONT.NEXT CONTEXT 

LABEL CONT.RESOURCE 

LABEL CONT_NORNAL SIZE 



LABEL 
LABEL 
LABEL 
LABEL 
LABEL 
LABEL 
LABEL 
LABEL 
LABEL 



PSTATE_ID0 
PSTATE 101 
PSTATE 102 
PSTATE ID3 
PSTATE RO 
PSTATE R1 
PSTATE~R2 
PSTATE~R3 
PSTATEllP 



LABEL CONT_PSTATE_SIZE 
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Handler IDs 
iiiii 




1 

2 
3 

4 

13 


1 
2 
3 
4 
5 
6 
7 
8 



LABEL 
LABEL 



HANDLER.INSTALL METHOO 
HAN0LER_LOOKUP_METH0O 



TAG_OBJID:0 
TAG_OBJI0:1 



iOV* 
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ROM. MOP 

This file contains system kernel routines for the MOP ROM 
Edit History (started 6/23/87) 

Who Date What 



Brl 



6/23/87 



Br1 
Brl 


6/24/87 
6/26/87 


Brl 
Br1 
Brl 
8r1 


6/29/87 
6/30/87 
7/06/87 
7/09/87 


Brl 
Br1 
Brl 


7/10/87 
7/13/87 
7/17/87 


Brl 

Br1 
Br1 


7/28/87 
8/05/87 
8/10/87 



8r1 



Br1 

Brl 
Br1 



8/11/87 



Br1 


8/12/87 


Br1 


2/05/88 


Brl 


2/10/88 


Br1 


2/16/88 


Brl 
Br1 
Br1 


2/19/88 
2/22/88 
3/04788 



3/08/88 

3/16/88 
3/18/88 



Added STAT_x labels. Added ROM SIZE 

calculations. Changed temporary use to 

avoid bashing 1n conjunction with dependency 

graph, and larger temporary space. Fault 

handlers now use FTEMPs Instead of TEMPs. 

New trashing specification to make variable 

use clearer. 

More work on method mode of XLATE EXC. 

Stack tasting code a boot Initialization. 

Started converting trap routines from TEMP 

to stack conventions. 

Continued converting to stack conventions 

Removed stack conventions 

Inserted stack conventions 

Started conversions to V8, Including the 

new register Instructions 

Continued conversions 

Put some Initial garbage collection attempts In 

Put 1n BRAT manipulation traps. We need more 

trap vectors for system calls. So, add a 

system call location to use another table 

sometime. 

Switched to version 9. 

Upgraded XLATE EXC 

Finished code for XLATE.EXC a method caching, 

but haven't tested 1t yet. Fixed some buas 

In the BRAT manipulators. 

Tested XLATE_EXC 8 method caching code. 

There Is a bug after the METHOD REQUEST REPLY 

that causes a MS6 fault. I think that the 

METHOO_REOUEST_REPLY message has a length that 

is maybe 1 too small, so when the 

RESTART_CONTEXT massage arrives, the last 

word of the previous message Is used as the 

message header??? Also updated os.mdp file. 

Fixed the method caching length-of -message 

problem. Made XFER restore data registers 

and ID registers, and not try to reXLATE 

A0 If 1fs ID register Is nil. 

Modified context format to move processor 

state to the end. Updated OS. MOP. 

Added FREE_COHTEXT_TRP * FREE CONTEXT MSG. 

Fixed OS.MDP that has OS vars 1n wrong place 

Added NEW_METHOO_MSG. ID TO NODE TRP, 

placed local XLATE 1n XLATE~EXC If or ID TO NODE 

IU?1L,'* oth,r * 1 "P'« "»•» of XLATE), "wrote 
5cND_M5G. 

Made XFER free contexts. Fixed up SWEEP TRP 
Finished & tested heap compactor. 
Changed IO_TO_NOOE_TRP, Removed XLATE RCVR 
mode - replaced with XLATEJ.OCAL and Tn-Hne 
code within SENO_MS6. Added XLATE ID TO NODE 
mode to XLATE. " ~ - r ~ ue 

**£? locked down region to memory map. Made 
LOCKHEAP equivalent to PUSH I.MOVE TRUE, I and 
UNLOCKHEAP to POP I. 

Added method cache overflow list support 
Added extebded system call mechanism. 
Added "copy" bit to method headers. Cached 
methods are now distinguished by this copy 
bit rather than using the method directory 
also for this purpose. Started INVAOR EXC 
handler. 



Ir1 9/03/IM 










MM *P WM) 



nf , ft ' »"*. - " *>i*W J*** 



*ift x ' :s-i-'>v : -ftteA.\&^i^ 



: Not*: loth Win cada ianttlMMi «M »*■ 1W*a. 9* mm 



ASM •os.aaf 



«M »M r* .■»*•**• ***** t- 
.1,* w"<$4st- '*ii.s ■"■ 

i a* «twiMi fit* 



MQOUU 

one ioi4 






, 'V J. ■* "»«C , 

'• ■'•■■ I00TC0DC 






i(i'"-:-'m„ 



mi 






'»« 



^*1?<#'ii>,;,S{ ! » 


1,* -'S 


£?> .'. 


'i.V . .'^. 


: ^i . 


:*3* 


,-.- 


; ^ : 


''ii 








o> 


'.fL» 




■ . ' ? 


1 :' t».; 


^SL .- 


'3«JK 




f 


«■,**. 


( *.?! 


*>V: 1 



>&*:: 






"0*36.., 



: *Ry •<-.•#•> too*,. 



SB 



-•grac^r 



s*. 



iti". ss 



-%.t*'^£'**> &s£hC ^^i^itS 



■■<#■ iji'i 'sMm.c; 






.'1' -A.-', 



;w, 



••I* a* J >iJW« ■ *«, »J' #. ."■''' t"\S*W' , > 






f.*.*t (. 3.? < -a.- ' ,: . 



' '" '■"■ I •■'"■•*' '■•*# 1 ?V'"' ,: 



■*?<**># 



BOOT -- This routine contains the cold boot MOP coda 

BOOT: 

; Find how much RAM we have 

DC 1024 ; This is a hack to mi 

i R0 with the amount of RAM 

; Clear memory 

MOVE R0.R1 ; copy amount of RAM to R1 

MOVE R0.R2 ; Also copy to R2 
MOVE NIL.RO 
_BOOT_CLRi 

BZ R1,*_BOOT_CLROONE • If loop done, break out 

SUB R1.1.R1 ; Oecrement R1 

£VE R0.CR1.A0J ; Stick NIL ,n address 

_BOOT_CLRDONE: ~ ^ 

; Save the RAM s1«e 1n the OS variable, now that RAM 1s clear 

2L: « R r2S M 7!I* RT : R0 <• 0ffset t0 ROM-START var 

MOVE R2.CR0.A0] ; VAR_ROM_START <- 1st ROM loc 



Set up exception vectors & xeall vectors 



_BOOT_ EXCV: 



X ADOR:(EXC_VECTORS«SYS_LEN BITS)|OS EVECTORS LENGTH 

MOVE R0.A1 

££« AOOR:(OS_EVECTORS_BASE«SYS_LEN_BITS)|OS_EVECTORS LENGTH 

MOVE RO.A2 - 

X OS_EVECTORS_LENGTH 
_BOOT_EXCV_LOOP: 

BZ RO.^BOOT XCALLV 

SUB R0.1.RO 

MOVE [R0,A11,R1 

MOVE R1.CR0.A2] 

BR *_BOOT_EXCV_LOOP 

_BOOT_XCALLV: 

DC ADOR:(XCALL_VECTORS<<SYS_LEN_BITS)|OS_XVECTORS LENGTH 

MOVE R0.A1 

2Lc S° 0R ;< 0S - XVECT0RS -B*5E<<SYS_LEN_BITS)|0S.XVECT0RS LENGTH 

DC OS.XVECTORS LENGTH 
_BOOT_XCALLV LOOP: 

BZ RO,~_BOOTSTACKS 

SUB R0.1.R0 

MOVE [R0.A1],R1 

MOVE R1.CR0.A2] 

BR ~_BOOT XCALLV LOOP 



Set up stacks 



_BO0TSTACKS: 

X 

WRITER RO.SP 

WRITER RO.SP" 



RO <- 



; Invalidate Queue registers 



_BOOT1: 



.BOOTS: 



WRITER *°^« YS - INVA0 «' (O5_QUeUE0_BASE«SYS_LEN_BITS) |OS_OUeUE0_MASK 

WMTER R ^[ OS - QUEUE0 - BASE « SY S-LEN-8ITS) 

WRITER R^2b2T 5 - 1NVW * 1 < 0S - 0UeU61 -B*««*YS-LEN.BITS) |OS_OUEUE1_MASK 

SSlTER »~ HL «-^'-B*«<<SVS.LEN.BITS, 

; Set up XLATE cache 

WRITER R X5 ?^ M OS - CACHE - BASE<<SYS - LeN - 8ITS > l° s -CACHE_MASK 

; Initialize OS variables 

f£vF °S-LXKED_BAS£*OS_LCCKEO_LENGTH . R0 <_ In1t1 „ he 

rwvt K0»K2 . Copy to M 

SoVE KnT ' R0 <- °""< to HEAP.BASE var 

DC C« ««?,L : Store 1n VAR_HEAP_BASE 

MOVE R2,[R0.AO] . Store 1n VAR_FREETOP 



dc var rom start 

move [roTaolri 

OC OS_INITIAL_BRAT length 

MOVE R0.R2 

SUB R1.RO.R1 

DC VAR_BRAT_BASE 

MOVE R1,[RO,AO] 

OC VAR BRAT LENGTH 

MOVE R2,[R0.A0] 

DC OS_INITIAL_BRAT MASK 

MOVE R0.R2 

DC VAR BRAT HASH MASK 

MOVE RZ,[R0,AO] 

OC VAR NEXT ID 

MOVE R0.R2 

MOVE O.RO 

MOVE R0.[R2,A0] 

DC VAR LAST ID 

MOVE R0.R2 

OC SVS_ID_ID_MASK 

MOVE R0.CR2.A0] 



X 

MOVE 

DC 

MOVE 

X 

MOVE 

DC 

MOVE 

X 

MOVE 

MOVE 



BOOT_CFREE INIT 
MOVE 
MOVE 
PUSH 

BOOT_CFREE INIT 
OC 

CALL 
MOVE 
POP 
PUSH 
MOVE 
SUB 
BNZ 
X 
POP 
MOVE 



VAR MCACHE BASE 

R0.R1 

OS MCACHE BASE 

RO.CR1.AO] 

VAR MCACHE LENGTH 

R0.R1 

OS.MCACHE LENGTH 

R0,[R1,A0] 

VAR_MCACHE_OVERFLOW LIST 

NIL.R1 

R1.CRO.AO] 



RO <- Offset to ROM START var 

R1 <- First ROM location 

RO <- Initial Hza of BRAT 

Copy length to R2 

R1 <- 8ase of BRAT 

RO <- Offset to BRAT BASE var 

Store 1n VAR BRAT BASE 

RO <- Offset to BRAT LEN var 

Store len In VAR_BRAT LENGTH 

RO <- Initial BRAT hash mask 

Move to R2 

RO <- Offset to hash mask 

Put Initial hash mask 1n var 

RO <- Offset to NEXT_IO var 

Copy to R2 for safe keeping 

RO <- 

VAR_NEXT_ID <- 

RO <- Offset to LAST.ID var 

Copy to R2 for safe keeping 

RO <- ID field mask 

(same as last ID) 
Put last ID In VAR_LAST_ID 

RO <- Offset to mcache var 

Swap to R1 

RO <- Initial base value 

Set MCACHE.BASE variable 

RO <- Offset to mcache length 

Swap to R1 

RO <- Initial length value 

Set MCACHE.LENGTH variable 

RO <- Addr of of low list 

R1 <- NIL 

Set of low list to NIL 



; Fill Context free 11st with a few contexts 



3.R3 

NIL.RO 
RO 

.LOOP: 

CONT NORMAL SIZE 
TRAP NEW CONTEXT 
[OBJECT ID,A1],R1 
R2 
R1 

R2,[CONT_NEXT_CONTEXT,A1 ] 
R3.1.R3 

R3,*800T_CFRE£ INIT LOOP 
VAR_CFREE LIST 
R1 
R1,CRO,AO] 



R2 <- Number of ctxts to make 

RO <- NIL 

Push NIL on the stack 

RO <- Size of normal context 

A1 <- New context address 

R1 <- Context ID 

R2 <- Old cfree list 

Push new cfree 11st 

Next context • Old cfree 11st 

Decrement ctxts left to make 

Loop 

RO <- Offset to cfree list 

R1 <- Cfree list 

Set up Cfree 11st variable 



; Enable message reception by masking off disable bits 



BOOT_ENABLE QUEUES: 

X -SYS.INVADR 

READ* QBM.R1 

AND R1.R0.R1 

WRITER R1.Q6M 

READR QBM'.RI 

AND R1.R0.R1 

WRITER R1.0BM' 

MOVE FALSE. RO 

WRITER R0,I 

BR -8KGD EXC 
BOOT_END: 



; RO <- All bits BUT the 
; Invalid address bit 

; Mask off disable bit 
i Mask off disable bit 



** ««»«•«««•««•«»*««•»««««««•««»«»»««»„»„««,„, «**«««.»»«»«««»«« 

BACKGROUND LOOPS 
•»•«««««**•••*•*■•*«»■•**>*•« „» t«««»««»«»«««»»»«t, <»•..,,,,.»,„„ 

DIEJTRP: 

BR "DIE TRP 
EMPTY_FAULT: 

BR ~EMPTY FAULT 
EMPTY.TRAP: 

BR ^EMPTY TRAP 
EMPTY_XCALL: 

BR ~EMPTY XCALL 
PUSH_EXC: 

BR ~PUSH EXC 
POP_EXC: 

BR *POP EXC 
BKGO_EXC: 
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WRITE.MSG -- Message routine to write a block of data to consecutive" 
locations. 

WRITE (destination-address) (data)* 



WRITE MSG: 

MOVE 

MOVE 

DC 

MOVE 

WTAG 

AND 

MOVE 

MOVE 

.WRITE MS61: 
GE 
BT 

MOVE 
MOVE 
ADO 
ADO 
BR 

_WRITE_MSG_EXITi 
SUSPEND 

WRITE.MSC ENO: 



[1.A3],R0 

R0.A2 

SYS LEN MASK 

C0.A3],R2 

R2,TAG_INT,R2 

R0.R2.R2 

2.R0 

0.R1 

R0.R2.R3 

R3.-*_WRITE_MSG EXIT 

[R0.A3].R3 

R2.CR1.A2] 

R0.1.RO 

R1.1.R1 

A _WRITE_MSC1 



; RO <- Destination address 

; Move to A2 

; RO <- Mask to keep len bits 

i R2 <- message header 

; Cast header into an INT 

; R2 <- message length 

; RO <- Src offset into queue 

; R1 <- Oest offset into A2 

i Are we at the end of message? 

; If so, exit 

; Get a "hunk o' data" 

; Toss It into the destination 



REAOJCG -- Message routine to read a block of data to consecutive 
locations. 

READ (source-address) (reply-node) (reply-header) 



READ MSG: 

MOVE 
MOVE 
SEND 
DC 
ANO 
BNZ 
SENOE 
SUSPEND 
_REAO MSGO: 
SUB 
MOVE 
SENO 

_REAO MSG1: 

EQUAL 

BT 

SEND 

ADO 

BR 

_REAO_MSG2: 

SENDE 
SUSPEND 

READ.MSG ENO: 



[1.A3].R1 

R1.A2 

C2.A3J 

SYS UEN MASK 

R1,R0,RT 

R1,~_READ MSGO 

C3.A3] 



R1.1.R1 

0.R2 

[3.A3J 

R1.R2.R0 
R0."_READ MSG2 
CR2.A2] 
R2.1.R2 
~_REAO MSG1 



R1 <- address/1 en of source 

Copy to A2 

Send reply node number 

RO <- Mask to keep length 

R1 <- length 

If length !- o, continue 

If no length, Just mall hdr 



Convert length to offset 
Initialize Index 
Send reply header 

Is Index • final 1ndex7 
If so, use SENDE Instead 
Send a word of data 
Increment source Index 
Loop again 



CR2.A2] 



Sand final word 



CALL_MSG -- Message routine to run a method 
CALL (method-1d) (mothod-spedf1c-args)» 



CALL MSG: 




MOVE 


[1.A3].R2 


XLATE 


R2.R0, XLATE METHOO 


CHECK 


RO.TAG- INT.R1 


DC 


IP:2 


PUSH 


RO 


POP 


IP 


CALL MSG END: 





R2 <- Method- 1d 

RO <- Method address 

Is this a hint? 

IP <- Offset of 2 Into method 



SENO_MSG -- Message routine to take an object 1d, and send the object 

referenced by the ID the selector "selector-symbol". If the object 
1s local, the method 1s run. If the object 1s on another node, 
we forward the message to the node. 

SEND (selector-symbol) (obJect-1d) (args)« 



SEND MSG: 




BR 


~SENO MSG START 


SEND MSG FORWARD TO HOME: 


LSH 


R1.-SYS ID ID BITS.R1 


AND 


ri , sys_id_node_mask.ro 


SENO MSG FORWARD TO HINT: 


SEND 


RO 


SUB 


R3.1.R3 


MOVE 


O.RO 


SEND_MSG_FORWARO LOOP: 


EQUAL 


R0.R3.R2 


BT 


R2,~SEN0 MSG FORWARD EXIT 


SENO 


[R0.A3] 


ADO 


R0.1.R0 


BR 


~SENO_MSG FORWARD LOOP 


SENO MSG FORWARD EXIT: 


SENDE 


CR0.A3] 


SUSPEND 




SENO_MSG_START: 




MOVE 


[0, A3J.R0 


ANO 


RO.SYS LEN_MASK,R3 


MOVE 


[2.A3J.R1 


XLATE 


R1,R0,XLATE_L0CAL 


BNIL 


RO.-SEND MSG FORWARD TO HOME 


CHECK 


R0.TAG_INT,R2 


BT 


R2,~SEN0_MSG_F0RWAR0 TO HINT 


SUB 


R3.3.R3 


MOVE 


R0.A2 


MOVE 


[OBJECT_HOR,A2],R1 


LSH 


R1,-SYS_LEN_BITS,R1 


ANO 


R1.SYS CLASS MASK.R1 


DC 


SYS_SELECT0R_8ITS 


LSH 


R1.R0.R1 


OR 


R1.C1.A3J.R1 


WTAG 


R1.TAG CS.R1 


XLATE 


R1.R2, XLATE METHOO 


DC 


MSG:(CALL_MSG«SYS LEN BITS) 


ADO 


R3.2.R3 


OR 


R0.R1.RO 


MOVE 


R2.R1 


CALL 


TRAP_ID_TO_NODE 


SEN02 


R1.R0 


SUB 


R3.2.R3 


at 


R3,*3EN0_MSG SENO LAST 


SENO 


R2 


MOVE 


3.R0 


SEND MSG LOOP: 




MOVE 


[R0.A3J.R2 


ADO 


RO.r.RO 


SUB 


R3.1.R3 


BZ 


R3,*SEND_MS6_SEN0 LAST 


SENO 


R2 


8ft 


'"SEND MSG LOOP 



SENO. 
SENO 



MSG SENO LAST: 
SENDE R2 
SUSPEND 

MSG END: 



Jump to main coda 

Shift Blrthnode number down 
Just keep node number field 

Send dest. node number 

R3 <• Index to last in queue 

RO <- 

Are we at last item? 
If so, send with SENOE 
Send item from queue 
Increment RO 



RO <- Message header 

R3 <- Length of massage 

R1 <- Object 10 

RO <• Bound value of obj 10 

If rcvr not here, forward meg 

Is value a hint? 

If so, forward msg to object 

R3 <- Length of args 

Copy address to A2 

R1 <• Header of object 

Shift class down 

R1 <- Class 

RO <- Bits of selector field 

Shift Class field up 

Merge with selector 

Tag as a class/selector 

R2 <- Method 10 

RO <- Msg Header w/o length 

R1 <- Length of CALL message 

Merge with message length 

Copy Method- ID to R1 

R1 <- Node(Method-ID) 

Send node, header 

R3 <- Length of args 

If no args, Just send meth-IO 

Send Method- ID 

RO <- Offset to args 

R2 <• Arg u ment from queue 

Increment erg offset 

Decrement length 

If last arg, send a end 

Send argument 

Loop 

Send R2 and end 



NEW_METHO0 — Message handler to allocate and fill a method for a given 
class/selector pair. This routine calls the Install Method handler 
to make the class/selector/ID bindings, but this routine suspends 
after calling InstallMethod, without waiting for 1t to complete. 

N£W_M£THOO (class) (selector) (slze-of-code) (code)* 



NEW METHOO MSG 






MOVE 


[3.A3J.R0 


; RO <- Size of code 


ADD 


R0.2.R0 


i Add In 2 header words 


MOVE 


CLASS METHOO ,R1 


; R1 <- "Method" class 


CAUL 


TRAP NEW 


Allocate an object 


XLATE 


R0.A2, XLATE OBJ 


A2 <- Address of object 


MOVE 


*,R1 


R1 <- Source offset 


MOVE 


2.R2 


R2 <- Oest offset 


MOVE 


[3,A3].R0 


RO <- Size of code 


NEW METHOO MSG 


LOOP: 




BZ" 


R0.~NEV METHOO MSG INSTALL 


If no size left then Install 


MOVE 


[R1.A31.R3 


R3 <- Data word 


MOVE 


R3.CR2.A2] 


Put data word 1n object 


SUB 


R0.1.R0 


Decrement size 


ADO 


R1.1.R1 


Increment source 


ADO 


R2.1.R2 


Increment destination 


BR 


"NEW METHOD MSG LOOP 


Loop 


NEW METHOO MSG 


INSTALL: 




MOVE 


NNR.R1 


R1 <- This node number 


DC 


MSG: (CALL MSG«SYS LEN BITS) I* 


RO <- header 


SEN02 


R1.R0 ~ ~ 


Send node, header 


DC 


HANOLER INSTALL METHOO 


RO <- 10 of InstallMethod 


SEND 


RO 


Send InstallMethod ID 


SENO 


[1.A3] • 


Send class 


SENO 


C2.A3] ; 


Send selector 


SENOC 
SUSPEND 
NEW METHOO MSG 


C0BJECT_ID,A2] ; 


Send method ID a end 


END: 





NEW.MSG -- Message routine to create a new Instance of a certain class and 
mail back the 10. «-'■»» «>>« 

NEW (s1ie-of-obJect) (class) (reply-1d) (reply-selector) (optional-data)* 



NEW_MSG: 



MOVE 
MOVE 
CALL 
XLATE 



DC 

MOVE 

WTAG 

AND 

SUB 



MOVE 
MOVE 
_NEW MSG1: 

ez 

sue 

MOVE 
MOVE 
ADD 
ADO 
BR 
_NEW_MSGEXIT: 
MOVE 
DC 
LSH 
SEND 

oc 

SEND 
SENO 
SENO 
SENOE 
SUSPENO 
NEW_MSG ENO: 



[1,A3],R0 
[2.A3],R1 
TRAP NEW 
R0,A2,XLATE_OBJ 

; •<< Copy Optional Data <>< 

SYSJ.EN MASK 
[0,A3],R1 
R1.TAG INT.R1 
R0.R1.R0 
RO.S.RO 



5.R1 
2.R2 

R0,"_NEW MSGEXIT 

R0.1.R0 

[R1.A3],R3 

R3,[R2.A2] 

R1.1.R1 

R2.1.R2 

/V _NEW_MSG1 

[3.A3].R1 

INT: -SYS 10 10 BITS 

R1.R0.R0 

RO 

MSG:(SENO_MSG«SYS_LEN BITS)|4 

RO 

C3.A3] 

C*.A3] 

[1.AZ] 



RO <- length of object 
R1 <- class 

Make a new object 

A2 <- Address of object 



RO <- low 10 bit mask 
R1 <- Message header 
Cast Into an INT 
RO <- length of message 
Ignore first 5 arguments, 

leaving optional data 

length In RO 
R1 <- offset Into queue 
R2 <- offset Into object 

If no data left, exit 
Decrement count 
R3 <- data from msg. stream 
Store data m object 
Increment offsets 

Loop 

R1 <- reply Id 

R0 <- t of bits of ID 

Shift node * dawn A put In R0 

Send destination node 

RO <- SENO message header 

Mall out the header 

Send the target Id 

Send the selector 

Send new obj ID as final arg 



METHOD_REQUEST_MSG « Look up a method ind m*1 1 the ENTIRE method 

Including headers to the requester In a METHOO_REQUEST_REPlY wrapper. 

METHOO.REQUEST (method-ID) (reply-node) 

Runs under: AO Absolute mode, Unchecked 



METHOO.REOJEST MSG ! 



MOVE 

MOVE 

XLATE 

DC 

ANO 

AOO 



DC 

OR 

SENDS 

SEND 

SUB 

MOVE 



[1,A3J,R1 

[2,A3],R2 

R1.A2, XLATE METHOD 

SYS LEN MASK 

R0.A2.R3 

R3.2.R3 



R1 <- Method ID 
R2 <- Requester node t 
A2 <• Address of method 
RO <• Mask to keep tan field 
R3 <- Length of method 
R3 <- Length of method 
♦ 2 words for msg a 10, 
yielding message length 
MSG: (METHOO_REQUEST_REPLY_MSG«SYS_LEN_BITS) |SYS_UNC 



R0.R3.RO 

R2.R0 

R1 

R3.2.R3 

O.RO 



_METHOO_REQUEST LOOP: 

SUB R3.1.R3 

BZ R3,~_METH00_REQUEST SEND LAST 

SEND CR0.A2] 

AOO R0.1.R0 

BR ^_METHOO_REOUEST LOOP 

_METHOD_REOUEST SENO LAST: 

SEN06 [R0.A2] 

SUSPENO 
METHOO.REQUEST MM END: 



RO <- Message header 

Send dest node* a msg header 

Send method- 10 

R3 <- Method length 

Current index • 

Decrement length 

If length • 0, send last word 

Mall out method word 

Increment Index 

Loop 

Send final method word 



METHOO_R£QUEST_R£PLY_MSG -- Store the method 1n an object end restart the 
wait 11st. 

METHO0_REQUEST_REPLY (method-ID) (method-data)* 

Runs under: AO absolute mode. Unchecked 



METHOD_REQU£ST_REPLY_MSG : 





DC 


SYS_LEN_MASK 




ANO 


R0,[0,A3],R0 




PUSH 


RO 




SUB 


R0.2.R0 




MOVE 


CLASS_METHOO,R1 




CALL 


TRAP NEW 




XLATE 


R0.A2, XLATE OBJ 




DC 


SYS COPY MASK 




OR 


RO. [OBJECT HDR,A2],R0 




MOVE 


R0,COBJECT_HDR,A2] 




POP 


RO 




SUB 


R0.4.RO 




MOVE 


♦ ,R2 




MOVE 


2.R1 


M_R_R_FILL_OBJ 






BZ 


R0,*M_R R COPIED 




MOVE 


[R2.A3],R3 




MOVE 


R3.[R1,A2] 




ADO 


R1.1.R1 




ADO 


R2.1.R2 




SUB 


R0.1.R0 




BR 


"M_R_R_FILL_OBJ 


M_R_R_COPIED: 






MOVE 


[1,A3],R0 




MOVE 


A2,R1 




ENTER 


R0.R1 




MOVE 


XCALL_BRAT_ENTER NEW.R3 




CALL 


TRAP_XCALL 




DC 


VAR_MCACKE BASE 




MOVE 


[R0,A0],R2 




DC 


VAR_MCACHE_LENGTH 




MOVE 


[R0,A0],R3 




MOVE 


C1.A3],R1 




ADO 


R2.R3.R2 



Search the Method Cache directory. 



M_R_R_SEARCH_MC_ID: 



R2.2.R2 

R3.2.R3 

R1.CR2.A0],R0 

R0,*M R R FOUND MC ID 

R3.-M_R_RlSEARCH MC ID 

*M_R_R_NOT IN MCACHE 



SUB 

SUB 

EQ 

BT 

BNZ 

BR 
M R R FOUND MC ID: 

MOVE ~ NIL.RO 

MOVE R0.CR2.AO] 

ADO R2.1.R2 

MOVE [R2.A0],R3 

MOVE R0,CR2,A0] 
M_R_R_RESTART CTXT FROM MCACHE: 

BNIL R3,t1 R R EXIT 



READR 
SENO 
DC 
SENO 
SENDE 
XLATE 
MOVE 
BR 
M R R EXIT: 

SUSPEND 



RO <- Mask to keep length 

RO <- Length of message 

Save RO on stack 

Ignore message header a ID 

R1 <- Class of a method 

Make a method object 

A2 <- Address of object 

RO <- Copy bit 

RO <- Hdr marked as a copy 

Mark object as a copy 

Restore RO (length of msg) 
RO <- Len of method w/o hdrs 
R2 <- Source Index 
R1 <- Destination Index 

If no more length, exit loop 
R3 <- Word from message 
Put word m method object 
Increment source Index 
Increment destination Index 
Decrement length left 
Loop 



RO <- Original method- 10 
R1 <- Method copy address 
Enter 1n XLATE cache 
R3 <- BRAT EnterNow Xcal! # 
Enter In BRAT 



; R2 <- Offset to method cache 

R3 <- Word size of cache 

R1 <- Method 10 from message 

R2 <- Offset past mcache 



Decrement offset 

Decrement length 

Is this the 1d we want? 

If so, branch t 

If length !■ 0, loop 

If not In MC, check of low list 

RO <- NIL 
Set ID To NIL 

Point offset to wait 11st 
R3 <- (car wait-list) 
Set wait 11st to NIL 



NNR.R2 

R2 

MSG:(RESTART_CONTEXT_MSG«SYS LEN 

RO 

R3 

R3,A2,XLATE_0BJ 

CCONT_NEXT_CONTEXT, A2 ], R3 

/ X_R_R_RESTART_CTXT FROM MCACHE 



If context ID Is nil, exit 
R2 <- This NNR 
Send a message to this node 
,BITS)|2|SYS_UNC 
Send message header 
Send ID to restart 
Get address of context 
R3 <- next ctxt 10 1n list 



If not m MCACHE directory, search overflow list. Use R2 to hold 
the previous context ID, and R3 the current context ID. Use these 
pointers to delink Items from the overflow list. 



M_R_R_NOT IN MCACHE: 

MOVE NIL.R2 

DC VARJCACHE OVERFLOW LIST 

MOVE [R0.A0LR3 

M_R R LOOP THRU OVERFLOW LIST: 
BNIL R3,"M R R EXIT 
XLATE R3.A2, XLATE OBJ 
EQ R1,[C0NT RESOURCE, A2],R0 
BT R0,"M R R UNLINK CTXT 



No previous ID 

RO <- Addr of of low 11st 

R3 <- Car of overflow 11st 

When 11st NIL, exit 
A2 <- Context Addr 
Waiting for this method? 
If so, cut ctxt out of 11st 



MOVE R3.R2 

MOVE CCONT_NEXT_CONTEXT,A2],R3 
8« "M_R_R_LOOP_THRU OVERFLOW LIST 
M_R_R_UNLINK CTXT: " t * ^l - u,, -'- 1 •" 

BNNIL R2,*M_R_RJJNLINK MIOOLE CONTEXT 
M_R_R_UNLINK FIRST CONTEXT: ~ luu ^- iMn '"' 

MOVE CCONT_NEXT_CONTEXT.A2].R3 
VAR_MCACHE_OVERFLOW LIST 
R3,[R0,AO] 

R2 . [CONT_NEXT_CONTEXT. A2 ] 
[OBJECT_ID,A2].R0 
t1_R_R_RESTART CTXT FROM LIST 



DC 

MOVE 
MOVE 
MOVE 
BR 
M_R_R LILYPAO: 
BR 



Priv ID <- Current ID 

R3 <- next ctxt ID In 11st 



If prev !• nil, link to next 

R3 <- Next context 

RO <- Addr of of low list 

Overflow 11st <- Next ctxt 

Next context ptr <- NIL 

RO <- Ctxt ID 

Queue up context for execution 



m_R_r_unlink_mioole_contextT 



*M_R_R_LOOP_THRU_OVERFLOW_LIST ; Hop to where we want 
>rti e /viijTcvf. — ' K w wl " ■ ™* went 



to be 



MOVE 

MOVE 

MOVE 

MOVE 

XLATE 

MOVE 



M_R_R_RESTART_CTXT_FROM_LIST : 



[CONT_NEXT CONTEXT,A2],R3 

NIL.RO 

RO , C CONT_NEXT_CONTEXT , A2 ] 

[OBJECT_ID,A21,R0 

R2.A2, XLATE OBJ 

R3 , [CONT_NEXT_CONTEXT, A2 ] 



PUSH 

REAOR 

SEND 

X 

SEND 

POP 

SENOE 

BR 



METHOD_ReOUEST_REPLY"Ei5D 



RO 
NNR.RO 

ro : 

MSG:(RESTART_CONTEXT MSG«SYS LEN 

RO ~ - 

RO 

RO 

"*_«_R LILYPAD 



R3 <- Next context 

RO <- NIL 

Next context <- NIL 

RO <- ID to cl1pped-out ctxt 

A2 <- Prtnt context »ddr 

Prev o— > Next (skipping curr) 

Save context ID 
RO <- This NNR 
Send a message to this node 
_BITS)|2|SYS_UNC 
Send Message header 
Restore context ID 
Send ID to restart 
Go to next element In 11st 
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MIGRATE_08JECT_MSG -- Move an object to a new node 
MIGRATE_OBJECT (obJect-1d) (node-number) 
Runs under: AO Absolute mode 



MIGRATE_OBJECT_MSG : 



MOVE 
MOVE 
MOVE 
CALL 
SUSPEND 
MIGRATE OBJECT MSG ENO 



[1.A3].R0 

[2.A3J.R1 

XCALL MIGRATE OBJECT, R3 

TRAP XCALL 



; RO <- Object ID 

; R1 <- Oest node number 

; Migrate the object 



IMMIGRATE_OBJECT_MSG -- Let this object reside on this node 
IMMIGRATE_OBJECT (object-id) (previous-residence) (objeet-data)t 
Runs under: AO Absolute mode, unchecked 



IMMIGRATE_OBJECT_MSG : 



PUSH 
MOVE 
MOVE 
MOVE 
AND 
PUSH 
SUB 
MOVE 
LSH 
AND 
CALL 
MOVE 
DC 
OR 
MOVE 
MOVE 
MOVE 
ENTER 
MOVE 
CALL 
MOVE 
POP 
SUB 
SUB 
IMMIGRATE_OBJECT LOOP: 
EQUAL R1,«,R2 
BT 

MOVE 
MOVE 
SUB 
SUB 
BR 



TRUE.R3 

R3.I 

[0,A3],R0 

RO.SYS LEN MASK.RO 

RO 

R0.3.R0 

C3,A3],R1 

R1.-SYS LEN BITS.R1 

R1.SYS CLASS MASK.R1 

TRAP MALLOC 

[3,A3],R2 

SYS_UNMOVA8LE MASK 

R2,R0,R2 

R2.C0.A2] 

[1.A31.R0 

A2.R1 

R0.R1 

XCALL_BRAT_ENTER NEW.R3 

TRAP XCALL 

R0.C1.A2] 

RO 

R0.1.R1 

R0.4.R0 



IMMIGRATE_OBJECT EXIT 
T 



R2.*IMMIGRATE_OBJECT EXIT 
CR1,A3].R2 
R2.[R0,A2] 
RO,1,RO 
R1.1.R1 
IMMIGRATE OBJECT LOOP 



POP 
DC 

SEND2 
MOVE 
SEND2E 
SUSPEND 
IMMIGRATE_08J£CT MSG ENO 



MSG : SYSJJNC | ( NOW.RESIDING AT 

[2,A3],R0 

NNR.RO 

[1.A3],R0 



Save interrupt status 

R3 <- True 

Disable Interrupts 

RO <- Message header 

RO <- Message length 

Save message length 

RO <- Object length 

R1 <- Object header 

Shift class down 

R1 <- Class of object 

Mai locate me some memory 

R2 <- Object header 

RO <- Unmovable bit 

Set unmovable bit In header 

Set header of new object 

RO <- Object ID 

R1 <- Address of block 

Enter ID/AOOR In XLATE table 

R3 <- BRAT EnterNew Xeall # 

Enter In BRAT 

F111 2nd slot with 10 

RO <- Message length 

R1 <- Offset to last msg word 

RO <- Offset to end of dest 

At first data word? 

If so, done 

R2 <- data word 

Put data word in object 

Decrement RO 

Decrement R1 

Loop 



; Pop 1nt. disable flag 
_MSG«SYS_L£N_BITS ) 1 3 

; Send previous node t, header 

; RO <- This node number 

; Send obj ID and this node # 



NOV_RESIDING_AT_MSC - Notify old residence of new residence a tell blrthnode 
NOV_RESIDING_AT (object-Id) (residence-node) 
Runs under: AO Absolute mode, unchecked 



NOW_RESIDING_AT_MSG: 



MOVE RO.M 

HOVE [1.A3J.R0 

MOVE [2.A3J.R1 

ENTER R0.R1 

MOVE XCALL BRAT ENTER, R3 

CALL TRAP XCALL 

HOVE [1,A3J,R1 

LSH R1,-SYS_I0 ID BITS.R1 

WTAG R1.TAG INT.Rl" 

DC MSG: SYSJJNC I (UPDATE BIRTHNODE 

SEND2 R1.R0 

SEND [1.A3] 

SEND [2, A3] 

MOVE NNR.RO 



NOP to prevent EARLY Fault 

RO <- Object ID 

Rl <- Residence Node t 

Cache RO -> R1 

R3 <- BRAT.ENTER Xcall * 

Bind 1n BRAT 

Rl <- Object ID 

Shift Blrthnode number down 
. Set tag to INT 
.MSG«SYS_LEN_BITS ) | * 

; Send header to blrthnode 
; Send object ID 
; Send new residence node 
; RO <- This node t 



SENDE Rfl 
SUSPEND 
NOW RESIDING AT_MSG ENO: 



; Send t as previous residence 



UPOATE_BIRTHNOOE_MSG — Notify the blrthnode of the new residence, and 
mark the object movable 

UPDATE_BIRTHNODE (obJect-1d) (residence-node) (previous-node) 
Runs under: AO Absolute mode, unchecked 



UPDATE_BIRTHNOOE MSG: 
MOVE NNR.R2 
MOVE 
MOVE 
MOVE 
EQUAL 
8T 

ENTER 

MOVE 

CALL 



[1.A3J.R0 
[2,A3],R1 
[3,A3].R3 
R3 R2 R2 

R2,'-UPDATE_BIRTHNO0E MOVABLE 
R0.R1 

XCALL_BRAT ENTER, R3 
TRAP XCALL 
UP0ATE_BIRTHNODE MOVABLE: 

K SSG:SYS_UNC|(0BJECT_MOVABLE MSG«SYS LEN BITS) 1 2 
wUSI ?l'Ki ; Send header to residence 

lwPENO C,A33 ; Send object ID 

UPOATE_8IRTHNO0C_MSG_END: 



R2 <- This node * 

RO <- Object ID 

R1 <- Residence Node # 

R3 <- Previous node t 

Was guy previously here? 

If so, don't reblnd again 

Cache RO -> R1 

R3 <- BRAT_ENTER Xcall t 

Bind 1n BRAT 



0BJECT_M0VA8LE_MSG — Mark the object movable 

OBJECT_MOVABLE (obJect-1d) 

Runs under: AO Absolute mode, unchecked 



OBJECT MOVABLE MSG: 

MOVE " RO.RO 

MOVE 

XLATE 

MOVE 

DC 

AND 

MOVE 

SUSPEND 
OBJECT.MOVABLE MSG END 



[1,A3],R0 

R0.A2, XLATE OBJ 

[O.A2],R1 

-SYS_UNMOVABLE_MASK 

R1.R0.R1 

R1.[0,A2] 



NOP to prevent EARLY fault 

RO <- Object ID 

A2 <- Object address 

R1 <- Object header 

RO <- All but unmovable bit 

R1 <- Movable object header 

Put heeder beck in object 



>>>... !<<!.». .tSK....,,, ,t, ,««,«,„„ ,«,»«„„„„„„, «««««»„„,„„ 

SYSTEM CALL TRAPS 



>>>><«><•< toi»*»><»t««««tt««*»t« >•<<<>«<>•< 
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XCALL_TRP -- Call an extended system call 

Runs under: AO absolute mode, unchecked 
Inputs: R3 
Trashes: R3 



XCALL TRP: 

PUSH 

DC 

ADO 

MOVE 

POP 

MOVE 

XCALL TRP END: 



RO 

OS_XVECTORS_BASE 

R0.R3.R3 

CR3.A0],R3 

RO 

R3.IP 



; Save RO 

i RO <- Base of xvectors 

; R3 <- Xvectors ♦ xcall t 

; R3 <- Xcall routine IP 

; Restore RO 

; Go to XCALL routine 



SWEEP_TRP — Sweep all non-marked objects 1n the heap down 
towards the base. 

Runs under: AO shadow 



SWEEP_TRP: 
BR 
.SWEEP EXIT: 
DC 

MOVE 
POP 
POP 
POP 
POP 
POP 
POP 
SWEEP_TRP START 
PUSH 
PUSH 
PUSH 
PUSH 
DC 

MOVE 
MOVE 
.SWEEP LOOP: 
PUSH 
MOVE 
MOVE 
DC 
MOVE 
GE 
BT 
.SWEEP CONTINUE: 
OC 
AND 

ez 

ADD 

MOVE 

PURGE 

MOVE 

CALL 

SUB 

MOVE 

AND 

ADD 

_SWEEP_ITERATE: 
BR 

.SWEEP COPY: 
MOVE 
AND 
AOO 
ADD 
EQUAL 
BT 



"SWEEP.TRP.START 

VAR FREETOP 

R1,[R0,A0] 

I 

R3 

R2 

R1 

RO 

IP 

RO 

R1 

R2 

R3 

VAR HEAP BASE 

[R0,A0],R2 

R2.R1 

I 

TRUE.RO 

RO.I 

VAR.FREETOP 

[R0,A0],R0 

R2.R0.R0 

RO.^.SWEEP.EXIT 

SYS.MARK MASK 

R0,[R2,A0],R0 

R0,~ SWEEP COPY 

R2.1.R2 

[R2,A0],R0 

RO 

XCALL.BRAT_PURGE.R3 
TRAP XCALL 
R2.1.R2 
[R2.A03.R0 
RO.SYS.LEN MASK.RO 
R2.R0.R2 



; Go to main code 

; RO <- &FREETOP 

; FREETOP <- New destination 



.SWEEP COPY LOOP 
BNZ" 
LSH 
MOVE 
AND 
OR 
OR 
WTAG 
PUSH 



~_SWEEP_LOOP 

[R2.JWJ.R0 

RO.SYS.LEN MASK.RO 

R2.R0.R2 

R1.R0.R1 

R1.R2.R3 

R3.~_SWEEP_ITERATE 
>. 

R0,~_SWEEP COPY LOOP2 
R1,SYS_LEN~BITS7R3 
CR1,A0],R0 
RO.SYS.LEN MASK.RO 
R0.R3.R0 

RO.SYS.REL MASK.RO 
RO.TAG ADOR.R0 
R1 



; RO <- Address of HEAP BASE 
; R2 <- Initial source ~ 
; R1 <- Initial destination 



i RO <- True 

; Prevent interrupts 

; RO <- 4FREETOP 

; RO <- End of heap 

; At or past the end of heap? 

; If so, then exit 

; RO <- Deletion flag mask 
; RO <- Only deletion bit 
; If not deleted, copy object 
; R2 <- Offset to object ID 
i RO <- Object ID 
; Remove object ID from cache 
; R3 <- BRAT Purge Xcall t 
; Remove object ID from BRAT 
: Make R2 be offset to object 
; RO <- Header of object 
; RO <- Length of object 
; Point src to next object 

; Go to next Iteration 

; RO <- Header of object 

; RO <- Length of object 

; R2 <- End of src 

i R1 <- End of dest 

; Does src ■ dest? 

; If so, go to next object 

; If RO !■ continue copying 

; R3 <- dest.addr « len bits 

i RO <- Header of object - 

; RO <- Length of object 

; RO <- base I len 

; Mark RO as relocatable 

; Tag as an address 

; Save R1 
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NEW_CONTEXT_TRP -- Create a context for a process 

This trap creates a context object when given the size of args 
and locals 1n RO. The context created looks like: 



start 

start 
start 
start 
start 
start 



pstate 
pstate 
pstate 
pstate 
pstate 
pstate 
pstate 
pstate 
pstate 



0: 
1: 
2: 
3: 
4: 
5: 



I Header | 

|_Context-ID I 

IPstateOffsetl (Offset from Header to pstate) 

I Next -Con text I 

I Resource | 

I Space | --- 
XAAAAA/ 



IWWWI 

I 100 I 

I 101 | 

I 102 I 

I ID3 | 

I R0 | 

I R1 | 

I R2 | 

I R3 | 

I IP I 



Length of space 1n R0 



(Method ID) 



The address of the block 1s returned 1n A1 a A2. The accompanying 

HMrrZ-£% E ? . _I ? f l? ?* lre f1l1ed 1n b * th1 * routine. The 
to fill m the 100-3, RO-3, and IP slots since these values may bT 
filled In with the offset from the header of the context. This field 
of conte" "" bu1,d1n ' <" • P°*"t.r to the pstate portion 

If the space needed 1s <• the normal context size (defined 



Runs under: 

Inputs: 

Outputs: 

Trashes: 



A0 absolute mode. 
R0 

A1,I01,A2,ID2 
R0 



unchecked 



NEV_CONTEXT TRP 
PUSH 
PUSH 
PUSH 
DC 

MOVE 
POP 
GT 
BT 

MOVE 
8NIL 
XLATE 
XLATE 
MOVE 
MOVE 
MOVE 
MOVE 
POP 
POP 
POP 
NEW_CONTEXT TRP 
ADO" 
PUSH 
ADO 
MOVE 
CALL 
XLATE 
XLATE 
POP 
POP 
POP 
MOVE 
MOVE 
MOVE 
POP 
NEW_CONTEXT TRP 



R1 

R2 

RO 

VAR_CFREE LIST 

R0.R2 

RO 

R0,CONT_NORMAL SIZE.R1 

R1 ,~NEW_CONTEXT TRP ALLOC 

[R2.A0J.R1 

R1,~NEV_CONTEXT TRP ALLOC 

R1.A1, XLATE OBJ~ ~ 

R1.A2. XLATE OBJ 

[C0NT_NEXT_C0NTEXT.A1 ],R0 

R0.[R2,A0] 

NIL.RO 

RO , [ CONT_NEXT CONTEXT , A1 ] 

R2 - . J 

R1 

IP 
.ALLOC: 

R0.9.R0 

RO 

R0,CONT_PSTATE SIZE.RO 

CLASS_CONTEXT.R1 

TRAP NEW 

R0.A2, XLATE OBJ 

R0,A1,XLATE~OBJ 

RO - 

R2 

R1 

R0,[CONT_PSTATE OFFSET, A2] 
NIL.RO 

RO , [CONT_NEXT_C0NTEXT, A2 1 
IP - . j 

END: 



Save R1 

Save R2 

Save RO 

RO <- Base of Cfree list 

Swap to Rz 

Restore RO with user size 

Is size > normal size? 

If so, allocate a new context 

R1 <- 1st ctxt In free list 

If no more normal, then alloc 

A1 <- Context Addr 

A2 <- Context Addr 

RO <- Next Context 

Point cfree list to next ctxt 

RO <- NIL 

Erase next ctxt ptr (for gc) 

Restore R2 

Restore R1 

Return 

; RO <- Offset to pstate 
; Save pstate offset 

RO <- Total context obj size 

R1 <- "context" class value 

Make a new object 

A2 <- Address of object 

Copy to A1 

Restore pstate offset 

Restore R2 

Restore R1 

Fill PSTATE-OFFSET ctxt field 

RO <- NIL 

No next context 



NEW_TRP -- Trap to generate a new object 

Takes the size of the object 1n RO and the class 1n Rl and allocates a block 
of memory for the object and assigns 1t a unique ID. The ID 1s 
returned 1n RO. The header 1s tagged as an object header, and the 
class/length field 1s filled 1n. The ID slot 1s filled with the 
newly generated ID for this object. In addition, the XLATE cache 
& BRAT are updated. 





Runs under: 


AO Absolute mode, Unchecked 






Inputs: 


R0.R1 






Outputs: 


RO 






Trashes: 


R1 




NEW TRP: 






PUSH 


I 


; Push 1nt. disable flag 


PUSH 


A2 


; Save A2 


PUSH 


R3 


Save R3 


MOVE 


TRUE.R3 


R3 <- True 


MOVE 


R3,I 


Disable interrupts 


CALL 


TRAP MALLX 


Mai locate ma son* memory 


LSH 


R1,SYS_LEN BITS,R1 


Shift class past len bits 


OR 


R1.R0.R1 


Merge class & length 


WTAG 


R1.TAG OBJHEAO.R1 


Tag class/length as objheader 


MOVE 


R1.C0.A2] 


F111 1st slot with class/len 


CALL 


TRAP GENID 


Generate an 1d Into RO 


MOVE 


A2.R1 


R1 <- Address of block 


ENTER 


R0.R1 


Enter I0/AOOR 1n XLATE table 


MOVE 


XCALL BRAT ENTER NEW.R3 


R3 <- BRAT EnterNew Xcall t 


CALL 


TRAP XCALL 


Enter 1n BRAT 


MOVE 


R0.C1.A2] 


F1U 2nd slot with ID 


POP 


R3 


Restore R3 


POP 


A2 


Restore A2 


POP 


I ; 


Pop Int. disable flag 


POP 


IP ; 


Return 


N 


EW_TRP_END: 







find an object on. Enter with the ID of the object in Ri 
and exit with the node number 1n R1. 



Runs under: 

Inputs: 

Outputs: 



AO Absolute mode 

R1 

R1 



ID_T0_NODE TRP 

PUSH 

XLATE 

CHECK 

BF 
I0_TO_NOOE LOCAL: 

MOVE NNR 
ID_TO_NOOE EXIT: 

POP R2 

POP IP 



R2 

R1,R1,XLATE_I0 TO NODE 
R1.TAG ADDR.R2 
R2.*ID_TO_NODE EXIT 



R1 



; XLATE locally, nil 1f unbound 
: Does tag • AOOR? 
; If not, we are done 

; R1 <- This node number 

; Restore R2 
: Return 



MALLOCTRP 



Primitive memory allocator 



memory. 
If the block 
be called wit 



Runs under: AO shadow, unchecked 
Input: ro 
Output: A2 



MALLOC TRP: 
PUSH 
PUSH 
PUSH 
PUSH 
MOVE 
DC 

MOVE 
ADO 
DC 
MOVE 
SE 
BT 
LSH 
OR 
OR 
VTAG 
MOVE 
DC 
MOVE 
POP 
POP 
POP 
POP 
POP 
.MALLOC BAD: 
CALL 
MALLOC_TRP_ENO: 



RO 

R1 

R2 

R3 

R0.R1 

VAR_FREETOP 

CR0,A0],R2 

R2.R1.R3 

VAR BRAT BASE 

[R0,AO],RO 

R3.R0.R0 

R0.*_MALLOC SAD 

R2.SYS_LEN_8ITS.R0 
R0.R1.R0 

R0.SYS_REL_MASK.R0 

R0,TAe_AODR,R0 

R0.A2 

VAR_FREETOP 

R3.[R0,AO] 

R3 

R2 

R1 

RO 

IP 

TRAP_DIE 



; Copy length to R1 

; RO <- Offset to VAR FREETOP 

; R2 <- VAR_FREETOP ~ 

; R3 <- address ♦ length 

: 52 5" 0ff »« to VAR_8RAT BASE 

; R0 <- Base of brat 

i Would new block be too big? 

'• jr,j?'. tr#,t u •* •" • rr °'- 

i Shift freetop base up 

s Merge m the length field 

; Mark address as relocatable 

; Cast into an AOOR 

; Copy to A2 

; R0 <- VAR_FREETOP 

i Update new freetop 



Die for now 



FREE_CONTEXT -- Free up the context 1n 101 

If the size of the context equals the normal fast context size th.n 
nL P S'far V?? teXt b » Ck onto the free """fter .? ocitlng a 



Runs under: 

Input: 

Trashes: 



AO Absolute Mode 
ID1 



FREE_CONTEXT TRP: 



PUSH 

PUSH 

MOVE 

CALL 

POP 

POP 

POP 



FREE_CONTEXT_TRP_ENO : 



RO 

R1 
ID1.R0 

TRAP_FREE_SPECIFIED CONTEXT 

R1 

RO 

IP 



FREE.SPECIFIEO.CONTEXT - Free up the context'wcified"^"^ 

If the size of the context equals the normal fast context siz. th.„ 

z," s c ?or h u c ?r xt ***, ? nt ° th « fr " ""Tn?, s,sr. 



replies). Otherwise, 



Runs under: AO Absolute Mode 
Input: ro 
Trashes: R0.R1 



FREE_SPECIFIE0_CONTEXT TRP: 



PUSH 

XLATE 

MOVE 

AND 

SUB 

SUB 

EQUAL 

BT 

MOVE 

OR 

MOVE 

BR 



A2 

R0.A2, XLATE OBJ 

[OBJECT_HOR,A2],R1 

R1.SYS_LEN_MASK.R1 

R1.4.R1 

R1.CONT_PSTATE SIZE.R1 

R1,CONT_NORMAL SIZE.R1 

RI.^FREE.CONTEXT TRP KEEP HIM 

COBJECT_HDR,A2],R1 " 

R1.SYS_MARK MASK.R1 

R1.r.OBJECT_HDR,A2l 

~FREE_CONTEXT_TRP EXIT 



FREE_CONTEXT_TRP_KEEP_HIM: 

"» No longer need to generate new 10 •»« 



: Save A2 
A2 <- Addr of context 
R1 <- Header of context 
R1 <- Length of context 
Subtract 4 first words 
R1 <- User space size 
Is user space • normal size? 
If so, add him to the 11st 
R1 <- Header of context 
Sat deletion bit 
Move hdr back to object 



PURGE RO 

PUSH I 

PUSH R3 

MOVE TRU6.R3 

MOVE R3,I 

MOVE XCALL.BRAT PURGE, R3 

CALL TRAP XCALL 

CALL TRAP~GENID 

MOVE RO.COBJECT I0.A2] 

MOVE A2.R1 

ENTER R0.R1 

MOVE A2.R1 

MOVE XCALL_BRAT_ENTER,R3 

CALL TRAP_XCALL 

POP R3 

POP I 

OC VAR_CFREE_LIST 

MOVE CR0,A0],R1 

MOVE R1.CCONT_NEXT_CONTEXT,A2] 

MOVE [OBJECT_I0,A2],R1 

„„,- "^ R1,CR0,A0] 

FREE_CONTEXT TRP EXIT: 

POP A2 

POP IP 
FREE_SPECIFIED_CONTEXT TRP END: 



; Remove 10 RO from cache 

; Save R3 

; R3 <- True 

; Disable Interrupts 

; R3 <- Purge Xcall t 

; Remove ID from BRAT 

; Make a new ID 

; Patch new 10 Into context 

; Ri <- Context ADDR 

; Make new cache binding 

; R1 <- Context Address 

; R3 <- Enter Xcall t 

; Enter binding 1n BRAT 

i Restore R3 

; Restore Interrupts 

; RO <- Offset to CFREE lut 

; R1 <- CFREE base 

; Put CFREE list es next ctxt 

; R1 <- Object ID 

: CFREE list <- Context ID 

; Restore A2 
i Return 
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VERSION.TRP -- Return the version number 



RetUPn !hir! tn?h?!]h n TM f 1 \ R ?- The verSl0n number 1s " INT l »9ged value 

M?7h^S V*l\ S 6US h °' d the maJor ver,1on number » nd "«• '" 16 
bus hold the minor version number. 



Runs under: 

Output: 

Trashes: 



AO Absolute Mode 

RO 

Internally: RO 

Totally: ro 



VERSION_TRP: 

~OC ROM VERSION 

MOVE [RO.AOl.RO 

POP IP 
VERSION TRP ENO: 



XFERxJTRP -- Transfer execution to a context 



The routines XFER_ID_TRP and XFER.ADOR.TRP both transfer control to a context 
entlr wUh iS ?n d »S y T^' 1 ^ phyiiCtl *>^»r*- To trensf.r by ID 
The'c^ext^s^RE^d J^aT * "'""■ ^ " ,th ' Mr "> '" A1 • 

Runs under: AO Absolute Mode 

XFER_IO_TRP 
Input: RO 
Trashes: Locally: 
Totally: 

XFER_ADDR_TRP 
Input: A1 
Trashes: Locally: 
Totally: 

Never returns. 



R0.A0.A1 
R0.A0.A1 



RO.AO 
RO.AO 



XFER_ID TRP: 

XLATE 
XFER_ADDR TRP: 
PUSH 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
LSH 
ADO 
LSH 
ADO 
ADO 

MOVE .._.„ 
XFER_A0OR CLR STACK: 
MOVE 0,R0 
WRITER RO.SP 



R0,A1,XLATE_OBJ 

I 

TRUE.RO 

R0,I 

[08JECT_I0,A1],R0 

R0.ID1 

R0,[7,A0] 

A1.R0 

R0,-SYS_LEN BITS.RO 

R0.[CONT_PSTATE 0FFSET.A1 1.R0 

R0.SYS_LEN_8ITS,R0 

R0,CCONT_PSTATE OFFSET.A11.R0 

R0,1,R0 

R0.A1 



MOVE 
PUSH 

MOVE 

WRITER 

MOVE 

WRITER 

MOVE 

WRITER 

MOVE 
MOVE 
MOVE 
MOVE 

PUSH 

PUSH 

MOVE 

CALL 

POP 

POP 

INVAL 



[PSTATE_IP,A1],R0 
RO 

[PSTATE_IDO.A1],R0 
RO.IOO 

[PSTATE_ID2.A1J.ro 
R0.ID2 

CPSTATE_ID3.A1],R0 
R0.ID3 

[PSTATE R0,A1].R0 
[PSTATE_R1,A1],R1 
[PSTATE_R2,A1].R2 
[PSTATE_R3.A1],R3 

RO 

R1 

[OBJECT_I0,A1],R0 

TRAP_FREE CONTEXT 

R1 

RO 



Get context addr In A1 



RO <- True 

Disable interrupts 

RO <- Context ID 

Set ID1 to context ID 

Store 1n current context ID 

RO <- Pointer to context 

Shift addr field down 

Add In offset to pstate 

Shift addr field up 

Add 1n pstate length - 1 

RO <- ADDR:<ps_addrXps len> 

A1 <- Pointer to pstate - 

RO <- 

Flush stack preparing 
for context resume 
RO <- Old IP from context 
Push IP on stack 



Save RO 

Save R1 

RO <- Context ID 

Free context 

Restore R1 

Restore RO 

Invalidate address regs 



MOVE 

BNIL 

POP 

XLATE 

POP 

XFER ADOR TRP END 
XFER 10 TRP END: 



IDO.RO 

RO . ~XFER_AOOR_CLR_STACK 

RO.AO, XLATE METHOD 
IP 



; RO <- Method-ID from context 

; If IDO slot ml, don't XLATE 

; AO <- Address of method 

; Transfer execution to context 



BRAT_PEEK_TRP - Finds the current slot of the ID 1n"the~BRAT 

Runs under: AO Absolute Mode, Unchecked 
Inputs: R0.R1.A2 
Output: RO 

The ID to hash to give first offset to start searching from 1 S m 

sei^lnt* "«?% ~°V« "' d,ffer "'" ? ™'< o.^? y»: , w. d r. ffer ' nt - 
nZto fnce^wou* d^wanV^t to°Z ?*" ,n - ""' «° ~ u ' d » ««• 

If the ID 1, not In the brat, NIL 1s r.turn.3 ?n Si. 



brat_peek trp: 
push 

PUSH 



R2 

R3 



VTA6 

LSH 

XOR 

LSH 

XOR 

LSH 

XOR 

LSH 

DC 

MOVE 

AND 



DC 

AND 



: Convert the ID Into an Initial offset key Into the BRAT 



R0,TAG_INT,R0 

R0.-8.R2 

R0.R2.R3 

R2.-S.R2 

R0.R2.R3 

R2.-8.R2 

R0.R2.R3 

R3.1.R3 

VAR_BRAT_HASH MASK 

CR0.A0],R0 " 

R3.R0.R3 

; Find the table length 

SYS_LEN MASK 
R0.A2.R2 



; Cast RO Into an INT 
R2 <- 10 » 8 
R3 <- 10 xor (ID » 8) 
R2 <- ID » 16 
R3 <- RO xor (ID » 16) 
R2 <- ID » 24 
R3 <- RO xor (ID » 24) 
R3 <- key • 2 . offset 
RO <- Offset to hash mask 
RO <- mask 
Now R3 holds key Into BRAT 



_BRAT_PEEK LOOP 

BZ~ 

EO 

BT 
_BRAT_PEEK_NEXT: 

SUB 

SUB 

LT 

BF 



Search for the ID starting at offset 



i R2 <- BRAT length 



R2.~_BRAT PEEK FAIL 

R1.[R3.A23,R0 " 

RO . \.BRAT_PEEK_GOT_HIM 

R2.2.R2 
R3.2.R3 
R3.0.R0 
RO.'^BRAT.PEEK LOOP 



If no more length, falli 
Have we found the target? 

Decrement length left 
Decrement current offset 
Is Offset < 0? 
If not, loop 



Wo must wrap around to top of BRAT 



DC 
AND 
SUB 
BR 



SYS_LEN MASK 
R0.A2.R3 
R3.2.R3 
~_BRAT_PEEK LOOP 



R3 <- Length of BRAT 

Point to top 10 slot 1n BRAT 



; If ID not 1n table, we end up here 



_BRAT_PEEK FAIL: 

MOVE NIL.R3 

_8RAT_PEEK GOT HIM: 

MOVE " R3.R0 
POP R3 
POP R2 
POP IP 

BRAT_PEEK TRP ENO: 



; R3 <- NIL 

; RO <- Offset of ID In BRAT 



EXTENDED CALL ROUTINES 

ttttltUtlllMIIUIIIIUtlt „„ , 



8RATJNTEH.XTIIP — Add an ID/ADOR pHr to the BRAT 

Runs Under: AO Absolute Mode, Unchecked Mode 
Inputs: R0.R1 

Takes and ID/ADOR pair m RO 4 R1 and enters the pair Into the BRAT. 



SRAT.ENTER XTRP: 

PUSH A2 
PUSH R3 
PUSH R2 
PUSH R1 
PUSH RO 



MOVE 
MOVE 

OC 
MOVE 
OC 
LSH 
DC 
OR 
WTAG 
MOVE 
MOVE 
MOVE 
CALL 
BNNIL 
MOVE 
MOVE 
CALL 
BNNIL 
CALL 
_BRAT_ENTER OK: 
MOVE 
ADO 
MOVE 

POP 
POP 
POP 
POP 
POP 
POP 



BRAT_ENTER_XTRP_ENO: 



R0.R2 
R1.R3 

VAR.BRAT BASE 

[R0.A0],R1 

SYS.LEN BITS 

R1,R0,R1 

VAR_BRAT_LENGTH 

R1,CR0.A0].R1 

R1.TAG_AOOR.R1 

R1.A2 

R2.R0 

R0.R1 

TRAP.BRAT PEEK 

RO,~_BRAT~ENTER OK 

R1.R0 

NIL.R1 

TRAP BRAT PEEK 

R0,~_8RAT_ENTER OK 

TRAP.DIE 

R2.CR0.A2] 

R0.1.R0 

R3.CR0.A2] 

RO 
R1 
R2 
R3 
A2 
IP 



R2 <- 
R3 <- 



10 
ADOR 



RO <- Offset to BRAT variable 
R1 <- BRAT.BASE 

Shift 8RAT_BASE to addr field 

R1 <- BRAT base | length 
Cast R1 Into an ADDR 
Move BRAT ptr Into A2 
RO <- ID that was passed 1n 

R1 <- 10 that was passed In 

Find offset a return In RO 

If offset I ■ nil, we got ID 

RO <- 10 (still in R1) 

R1 <- NIL 

Find offset a return 1n RO 

If offset non nil, still room 

If no rooie, die for now. 

Put ID In 1st slot 

Put ADOR In 2nd slot 



BRAT_ENTER_NEW_XTRP — Add a new IO/AOOR pair to the BRAT 



Runs Undar: 
Inputs: 



AO Absolute Mode, Unchecked Mode 
R0.R1 



Takes and ID/AOOR pair in RO ft R1 and enters the pair Into the BRAT. The 
caller must be' sure that the ID Is not already In the BRAT, because 
no search is made for pre-existance. This routine 1s Intended to 
be a fster way to enter Initial bindings, as In a NEW call. 



BRAT_ENTER NEW XTRP: 


PUSH 


A2 


PUSH 


R3 


PUSH 


R1 


PUSH 


RO 


PUSH 


RO 


HOVE 


R1.R3 


oc 


VAR BRAT BASE 


HOVE 


[R0,A0],R1 


OC 


SYS_LEN_BITS 


LSH 


R1,R0,R1 


OC 


VAR_BRAT LENGTH 


OR 


R1,CR0,A0],R1 


WTAG 


R1,TAG_AODR,R1 


MOVE 


R1.A2 


POP 


RO 


MOVE 


NIL.R1 


CALL 


TRAP_BRAT PEEK 


BNNIL 


RO.*_BRAT_ENTER NEW OK 


CALL 


TRAP DIE 


.BRAT_ENTER_NEW OK: ~ 


POP 


R1 


PUSH 


R1 


HOVE 


R1.CR0.A2] 


ADO 


R0.1.R0 


HOVE 


R3.tR0.A2] 


POP 


RO 


POP 


R1 


POP 


R3 


POP 


A2 


POP 


IP 



; Save RO 
; R3 <- AOOR 

RO <- Offset to BRAT variable 
R1 <- BRAT_BASE 

Shift BRAT.BASE to addr field 

Rl <- BRAT base I length 
Cast R1 Into an AOOR 
Hove BRAT ptr Into A2 
RO <- ID that was passed In 
Rl <• NIL (find empty slot) 
Find offset « return In RO 
If offset non nil, still roe*) 
If no rooa, die for now. 

Rl <- ID 

Push ID back on stack 

Put 10 in 1st slot 

Put AOOR 1n 2nd slot 



BRAT_ENTER_NEW_XTRP_END: 



BRAT_XLATE_XTRP — XI at* an ID from the BRAT Into an ADOR 

Runs Under: AO Shadow, Unchecked Mode 
Inputs: RO 
Output: RO 

Takes the ID to lookup 1n the BRAT 1n RO. When the corresponding 
ADOR value 1s found, 1t 1s returned 1n RO. 



BRAT_XLATE_XTRP: 

PUSH A2 
PUSH R2 
PUSH R1 



MOVE 

oc 

MOVE 
DC 

LSH 

DC 

OR 

WTAG 

MOVE 



R0.R2 

VAR_BRAT BASE 
[R0,A0],R1 
SYS_LEN BITS 
R1.R0.R1 
VAR_BRAT_LENGTH 
R1.[R0,A0].R1 
R1.TAG ADOR.R1 
R1.A2 



MOVE R2.R0 
MOVE R2.R1 
CALL TRAP_BRAT_PEEK 

BNIL RO,~_BRAT_XLATE_RETURN 

ADO R0,1,R0 

MOVE [R0,A2],R0 
_BRAT_XLATE RETURN: 

POP R1 

POP R2 

POP A2 

POP IP 
BRAT XLATE_XTRP END: 



; R2 <- ID 

; RO <- Offset to BRAT variable 

: R1 <- BRAT.BASE 

; Shift BRAT.BASE to addr field 

: R2 <• BRAT base I length 

; Cast R2 Into an ADDR 

i Move BRAT ptr Into A2 



i Find offset & return 1n RO 
i If RO nil return the nil 

; Pick out ADDR a return In RO 



BRAT_PURGE_XTRP -- Purge an ID/ADOR pair from the BRAT 

Runs under: AO Shadow, Unchecked Mode 
Inputs: RO 

Enter with ID to purge 1n RO. The routine writes NIL Into both 
the ID 4 ADM slot of the binding 1n the table. 



BRAT PURGE XTRP: 


PUSH 


A2 


PUSH 


R2 


PUSH 


R1 


PUSH 


RO 


MOVE 


R0.R2 


DC 


VAR BRAT BASE 


MOVE 


[R0,A0].R1 


DC 


SYS LEN BITS 


LSH 


R1.R0.R1 


X 


VAR_BRAT LENGTH 


OR 


R1,[R0,A0].R1 


WTAG 


R1,TAG_ADDR,R1 


MOVE 


R1.A2 


MOVE 


R2.R0 


MOVE 


R2.R1 


CALL 


TRAP_BRAT PEEK 


BNIL 


RO , ~_BRAT_PURGE_RETURN 


MOVE 


R0.R1 


DC 


SYMiO 


MOVE 


R0,[R1,A2] 


AOO 


R1.1.R1 


MOVE 


R0,[R1,A2] 


_BRAT_PURGE RETURN: 


POP 


RO 


POP 


R1 


POP 


R2 


POP 


A2 


POP 


IP 


8RAT_PUR6E XTRP ENO: 



; R2 <- 10 

: RO <- Offset to BRAT variable 

; R1 <- BRAT_BASE 

; Shift BRAT.BA5E to addr field 

; R2 <- BRAT base I length 

; Cast R2 Into an ADDR 

i Move BRAT ptr Into A2 



; Find offset & return 1n RO 
; If ID not 1n table, return 



MIGRATE_06JECT_XTRP -- Takes an object ID and sends object to a node 

The ID of the object to migrate 1s 1n RO, and the destination node 

number 1s 1n R1. If the object 1s not local, a MIGRATE OBJECT MSG 
message 1s sent to the residence of the object. 

Runs under: AO absolute mode, unchecked 
Inputs: RO, R1 
Trashes: R2, R3 



MIGRATE OBJECT 
PUSH 
MOVE 
MOVE 
XLATE 
PUSH 
CHECK 
BT 
MIGRATE OBJECT 
SENO 
DC 

SEND 
POP 
SEND2E 
POP 
POP 
MIGRATE OBJECT 
PURGE " 
MOVE 
CALL 
AND 
DC 
ADO 
ADO 
SEN02 
POP 
SENO 
MOVE 
SENO 
MOVE 
MIGRATE OBJECT 
MOVE 
SUB 
BZ 

SENO 
ADO 
BR 
MIGRATE OBJECT 
SENOE ' 
X 
OR 
MOVE 
POP 
POP 
MIGRATE_OBJECT 



XTRP: 
I 

TRUE.R2 

R2.I 

R0.R2. XLATE ID TO NOOE 

RO " ~ 

R2.TAG ADOR.R3 

R3,*MIGRATE_OBJECT LOCAL 
.FORVARO MESSAGE: 

R2 

MSG: (MIGRATE OBJECT MSG«SYS 

RO 

RO 

R0.R1 

I 

IP 
LOCAL: 

RO 

XCALL BRAT PURGE, R3 

TRAP XCALL 
R2.SYS LEN MASK.R3 
MSG :SYS_UNC I (IMMIGRATE OBJECT 
R0.R3.R0 
R0.3.R0 
R1.R0 
RO 
RO 

NNR.RO 
RO 
0.R0 
.LOOP: 
R2.A2 
R3.1.R3 

R3, 'MIGRATE OBJECT LAST 
CR0.A2] 
R0.1.R0 

-MIGRATE OBJECT LOOP 
LAST: 
[R0.A2] 

TAG_OBJHEAO:SYS MARK MASK 
RO,C0.A2],R0 
RO,[0,A2] 
I 

IP 
XTRP_ENO: 



; Save old I-01sable flag 

; R2 <- True 

; Disable Interrupts 

; R2 <- Address of ID 1n RO 

: Save ID 

; Is object local? 

; If so, migrate H 

; Send residence node # 
.LEN_BITS)|3 

; Send message header 
; Restore object 10 
; Send object 1d 4 node t 
; Restore interrupts 
; Return 

; Remove binding from cache 

; R3 <- Purge Xcall t 

; Purge RO from BRAT 

; R3 <- Length of object 
MSG«SYS_LEN_BITS) 

; Add length of object 

; Add 3 for hdr. ID, this node 

; Send node t, header 

; RO <- ID 

; Send ID 

; RO <- This nod^ t 

; Send this node number 

; Current Index • 

; Copy object address to A2 

; Decrement length 

; If length ■ 0, send last word 

; Mall out object word 

i Increment Index 

; Loop 

; Send final object word 

; RO <- Deletion mark mask 

; Mark header deleted 

; Store back into header 

; Restore Interrupts 

; Return 



.««..»«,...„,„.,,,„.„„,„ «.««««««.«„„.„„„„„„„„„ 

EXCEPTION HANDLERS 
.....t«... .„.„.„,,,„„ t»,..„„„„„„„ t „„„„ ,„„„„,„ 



INVAOR.EXC -- Exception handler for access of an Ax register w1thTbit"set 
Runs under: AO absolute mode, unchecked 



INVAOR EXC: 
~ PUSH 
PUSH 
PUSH 
PUSH 
MOVE 
DC 
AND 
DC 
LSH 
EQUAL 
BT 

EQUAL 
8T ... ... 

INVADR_EXC NORMAL OPO: 
MOVE 0,R3 
DC X11 
AND R2.R0.R2 
BR ~INVADR EXC REXLATE 



RO 
R1 

R2 

R3 

TRP.H3 

SYS_OP0 MASK 

R3,R0 r RZ" 

-(SYSJ3P0 BITS ♦ 2 ♦ 2) 

R3.R0.R1 

R1.2.R0 

R0,~INVADR_EXC_REG ORIENTED 

R1.3.R0 

RO, ~INVA0R EXC REG ORIENTED 



: R3 <- Faulting Instruction 
! RO <- Mask to keep OPO field 
: R2 <- OPO field 
; RO <- Bits to shift down 

R1 <- Opcode 

Is opcode 2 (REAOR)T 

If so, treat OPO special 

Is opcode 3 (WRITER)? 

If so, treat OPO special 

R3 <- (means curr. priority) 
Mask to keep Ax bits 
R2 <- A Index 
Re-translate IOx -> Ax 



INVADR EXC REG ORIENTED: 

LSH R2,-(SYS_OP0 BITS - 1).R3 
OC X11 
ANO R2.RO.R2 
INVADR EXC REXLATE: 

LSH R3.2.R3 
OR R3 R2 R3 
INVAOR EXC DISPATCH ON PAA: 

BR R3 
INVADR_EXC_ID_LOADERS: 
MOVE IOO, RO 

"INVADR EXC XLATE 

ID1.R0 

"INVAOR EXC XLATE 

ID2.R0 " 

"INVADR_£XC XLATE 

103, RO 

"INVAOR EXC XLATE 

IOO'.RO 

"INVADR EXC XLATE 

ID1*,R0 

"INVAOR EXC XLATE 

102 \R0 
"INVAOR_EXC XLATE 

103 \R0 
INVAOR.EXC XLATE 



; R3 <- Relative priority 
; Mask to keep Ax bits 
i R2 <- A index 



BR 

MOVE 
BR 

MOVE 
BR 

MOVE 
BR 

MOVE 
BR 
MOVE 
BR 

MOVE 
BR 

MOVE 
BR 
INVAOR.EXC XLATE 

XLATE RO , R1 , XLATELOCAL 



; R3 <- 


(PAA) 


Branch 


forward R3 word] 


RO <- 


IOO 




Branch 


and 


XLATE 


RO <- 


101 




Branch 


and 


XLATE 


RO <- 


102 




Branch 


and 


XLATE 


RO <- 


103 




Branch 


and 


XLATE 


RO <- 


IOO' 




Branch 


and 


XLATE 


RO <- 


101' 




Branch 


and 


XLATE 


RO <- 


ID2- 




Branch 


and 


XLATE 


RO <- 


ED3' 




Branch 


and XLATE 


R1 <- 


Addr, 


Int. or NIL 



What 1s object Isn't here I If XUTE faults, we don't save stacksl 



EARLY_EXC -- Exception handler for early queue access 



Runs under: 
Trashes: 



AO shadow 
TEMPO 



EARLY EXC: 

MOVE 

POP 

WTAG 

LSH 

SUB 

LSH 

WTAG 

PUSH 

MOVE 

POP 

EARLY_EXC_ENO: 



RO,C TEMPO, AO] 

RO 

RO.TAG INT.RO 

R0,-9,R0 

R0.1.R0 

R0.9.R0 

RO.TAG IP.RO 

RO 

C TEMPO, AO],RO 

IP 



Save RO 1n TEMPO 
RO <• Return Address 
Cast Into an INT 
Shift RO to LSBIts 
Back up address/phase 
Shift address field back 
Cast back Into an IP 
Push return IP on stack 
Restore RO 
Retry Instruction 



SENO_EXC -- Exception handler for send buffer overflo 



Runs under: 
Trashes: 



AO shadow 
TEMPO 



SENO EXC: 

MOVE 

POP 

WTAG 

LSH 

SUB 

LSH 

WTAG 

PUSH 

MOVE 

POP 

SENO EXC ENO: 



RO.CTEMPO.AO] 

RO 

RO.TAG INT.RO 

R0,-9,R0 

R0.1.R0 

R0.9.R0 

RO.TAG_IP,RO 

RO 

[TEMPO, AO].RO 

IP 



Save RO 1n TEMPO 
RO <- Return Address 
Cast Into an INT 
Shift RO to LSBHs 
Back up address/phase 
Shift address field back 
Cast back Into an IP 
Push return IP on stack 
Restore RO 
Retry Instruction 



XLATE_EXC — Exception handler for translation fault 



Runs under: 
Trashes: 



AO Absolute Mode, Unchecked 
TEMPO-4 



XLATE.EXC: 

MOVE 
MOVE 
MOVE 
MOVE 

REAOR 
WTAG 



R0,[ TEMPO, AO] 
R1,[ TEMPI, AO] 
R2,[TEMP2,A0] 
R3.CTEMP3.A0] 

TRP.RO 
R0,TAG_INT,R0 



Save data registers In 
TEMPO - TEMP3 for use 
as an array 

RO <- Current priority TRP 



MOVE RO.fTEMPA.AO] 

LSM R0,-7,R0 

AND R0.X11.RO 

ADO RO.TEMPO.RO 

MOVE [RO,AO],RO 

MOVE R0.R1 

MOVE XCALL.BRAT XLATE.R3 

CALL TRAP_XCALL 

8NIL R0,*XLATE_EXC_NO_BIN0ING 

ENTER R1.R0 



TEMP4 <- Current priority TRP 
Pick out src. register field 

Add TEMPO as start of array 
Load RO with source 10 
Copy ID to R1 



See 1f ID 1s 1n BRAT 

If not, handle no binding 

Enter pur in cache 



XLATE RETRY: 
POP 
LSH 
SUB 
LSH 
PUSH 

MOVE 
MOVE 
MOVE 
MOVE 
POP 



R3 

R3.-9.R3 

R3.1.R3 

R3.9.R3 

R3 

C TEMPO, A0],R0 
[TEMPI, A0],R1 
[TEMP2,A0].R2 
[TEMP3,A0],R3 
IP 



XLATE_EXC_NO_BINDING: 



MOVE 

LSH 

DC 

AND 

EQUAL 

BT 

EQUAL 

BT 

EQUAL 

BT 

XLATE EXC LOCAL: 
MOVE 
DC 
AND 
A 00 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
POP 



R3 <- Return IP 

Shift IP until phase 1s LSB 

Back up one phase 

R3 <- Failed Inst. IP 

Put retry IP on stack 

Restore data registers 



Retry failed Instruction 



[TEMP*,A0].R0 

R0,-(SYS OPO BITSfSYS_OP1 BITS), 

(1 « SYS OP2 BITS) - 1 

R2.R0.R2 

R2, XLATE OBJ.RO 

RO,~XLATE EXC OBJ MODE 

R2, XLATE ID TO NOOE.RO 

RO.~XLATE_EXC ID TO NODE MODE 

R2, XLATE METHOO.RO " 

R0,~XLATE EXC.METHOO MODE JUMP 



R2 



RO <- Failed Instruction 
> 

RO <- mask to keep op2 field 

R2 <- XLATE mode from op2 

Were we 1n XLATE_OBJ mode? 

If so, branch 

Were we 1n XLATE_ID_T0_NO0E? 

If so, branch 

Were we 1n XLATE_METHOO mode? 

If so, branch 



TRP.R1 

%1 111111 

R1.R0.R2 

R2, TEMPO, R2 

NIL.RO 

R0.CR2.A0] 

C TEMPO, A0],R0 

[TEMPI, A0J.R1 

[TEMP2,A0],R2 

[TEMP3,A0],R3 

IP 



<«• Dest must be a data register! >*« 
R1 <- Failed XLATE 
RO <- Mask to keep Oest field 
R2 <- Oest field of XLATE 
R2 <- TempOCOest] 
RO <- NIL 
TempOCOest] <- NIL 
Restore data registers 



XLATE_EXC OBJ MODE: 

CALL TRAP_0IE 

XLATE_EXC_METHOO MOOE JUMP: 

BR ~XLATE_EXC_METHOO_MODE 



; Return 



Just die for now 



Jump extender 



XLATE EXC ID 


TO NODE MOOE: 


MOVE" 


TRP.R1 


LSH 


R1.-7.R1 


ANO 


R1.%11.R1 


AOD 


R1, TEMPO, R1 


MOVE 


[R1,A0],R1 


LSH 


R1.-SYS ID 10 BITS.R1 


ANO 


R1.SYS ID NOOE MASK.R1 


MOVE 


TRP.R2 


DC 


%1111111 


ANO 


R2.R0.R2 


ADO 


R2. TEMPO, R2 


MOVE 


R1,[R2,A0] 


MOVE 


[TEMPO. A0],R0 


MOVE 


CTEMP1,A0],R1 


MOVE 


[TEMP2,A0],R2 


MOVE 


[TEMP3,A0],R3 


POP 


IP 



Rl <- Failed XLATE 

Shift Source bits down 

Just keep source bits 

R1 <- TEMPO ♦ Rs 

R1 <- Source ID 

Shift Blrthnode number down 

Just keep node number field 

R2 <- Failed XLATE 

RO <- Mask to keep Dest field 

R2 <- Dest field of XLATE 

R2 <- TEMPO ♦ Oest (Rx only!) 

TEMPT. Dest] • blrthnode number 

Restore data registers 



Return 



XLATE EXC METHOD MOOE: 
POP R3 
LSH R3.-9.R3 
SUB R3.1.R3 
LSH R3.9.R3 



; Shift IP until phase 1s LSB 

; Back up one phase 

; R3 <- Failed 1nst. IP 



; Now Rl holds source ID. 4 retry IP 1s 1n R3 



XLATE EXC SAVE_MSG: 
PUSH R1 
PUSH 102 

MOVE [O.A3],R2 



; Save away R1 

; Push ID2 on stack 

; R2 <- Message header 



DC 

ANO 

ADO 

MOVE 

CALL 

XLATE 

PUSH 



SYS_LEN MASK 

R0.R2.R2 

R2.2.R0 

CLASS_MESSAGE,R1 

TRAP NEW 

R0.A2, XLATE OBJ 

RO 



ADO R2.2.R1 

XLATE_EXC COPY MSG: 

BZ R2,~XLATE_EXC MAKE CONTEXT 

SUB R2.1.R2 

SUB R1.1.R1 

MOVE [R2.A3],R0 

MOVE R0,[R1,A2] 

BR ~XLATE_EXC_COPY_MSG 

XLATE_EXC_MAKE_CONTEXT: 



MOVE 

CALL 

PUSH 

MOVE 

MOVE 

MOVE 

LSH 

ADO 

LSH 

AOO 

ADO 

MOVE 



0.R0 

TRAP_NEW_CONTEXT 

I 

TRUE.RO 

RO,I 

A1.R0 

RO.-SYS_LEM_BITS,RO 

R0,CCONT_PSTATE OFFSET.A21.R0 

RO,SYS_LEN_BITS,RO 

RO , CCOMT_PSTATE_OFFSET. A2 ]. RO 

R0.1.R0 

R0.A2 



AO -> 777? 100 -> 777? 

A1 -> Context 101 -> Context 

A2 -> Pstate ID2 -> 7777 

A3 -> 777? 103 -> 7777 



MOVE 



POP 

MOVE 

POP 

MOVE 

REAOR 

MOVE 

REAOR 

MOVE 



MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 

CHECK 
BF 



Fill IP slot of context 

R3.[PSTATE_IP.A2] 

Fill ID slots 1n context 

R3 

R3,[PSTATE_I03.A2] 
R3 

R3.CPSTATE_I02.A2] 
101. R3 

R3,[PSTATE_I01.A21 
100. R3 
R3.CPSTATE_IO0,A2] 

F111 Rx slots In context 

C TEMPO .A01.R3 

R3.CPSTATE_R0.A2] 

C TEMPI. A0].R3 

R3.tPSTATE_R1.A2] 

[TEMP2.A01.R3 

R3,CPSTATE_R2.A2] 

CTEMP3.A0J.R3 

R3.CPSTATE_R3.A2] 

R1.TA6_CS,R3 

R3 . *XLATE_EXC_REQUEST_METHOO 



XLATE_EXC_LOOKUP METHOO: 
MOVE NNR.R3 

SENK ^" L '-«««^_LEN.BITS„3 

SENOE C0BJECT.ID.A21 
SUSPENO 

XLATE_EXC_REQUEST METHOO: 

OC VAR_RCACHE_BAS€ 

MOVE CR0.AOLR2 

OC VAR_MCACHE_LENGTH 

MOVE CR0.A0],R3 

MOVE NIL.RO 

MOVE R0.CTEMP4.A0] 

POP I 

POP R1 

Now R1 holds the method ID, R2 
the method cache, and R3 holds 
method cache 

ADO R2.R3.R2 



i RO <- Mask to keep Ian bits 
; R2 <- Length of msg 
; RO <- Length * 2 words hdr 
; R1 <- Class for copied msg 
; Make an object to hold msg 
; A2 <- Address of object 
Push msg object ID on stack 

R1 <- Length ♦ 2 words hdr 

If no length, done copying 
Decrement source Index 
Decrement dest Index 
RO <- word from queue 
Copy Into msg object 
Loop 



No local space needed 
A2 <- Context address 

RO <- True 
Disable interrupts 
RO <- Pointer to ctxt 
Shift addr portion down 
Add pstate offset to addr 
Shift addr portion back up 
Add 1n length - i 
RO <- AOOR:<ps_addrXps len> 
A2 <- Pointer to pstate - 



; Context IP <- backed up IP 

; Point 103 to msg object 
; 102 1s on stack 



; Does Tag • class/selector? 
i If not. we were xlatmg an id 

; R3 <- This node number 

; RO <- header 

; Send node, header 

i RO <- 10 of LookupMethod code 

; Send LookupMethod ID,c/s 

; Send context to reply to 



: R2 <- Base of method cache 

; R3 <- Length of method cache 

; TEMP* <- NIL 

; Set R1 back (clean up later) 

holds the base of 
the length of the 

; R2 <- Offset past mcache 



XLATE_EXC_SEARCH_MC_ID: 



SUB 

SUB 

EO 

BT 

MOVE 

BNNIL 

MOVE 

BNNIL 

MOVE 



R2.2.R2 
R3.2.R3 

R1.CR2.A0].R0 

R0,~XLATE_EXC_FOUNO MC ID 
[R2.A0],R0 " " 

RO.~XLATE_EXC MC LOOP 
[TEMP*,A0).R0~ " 
RO,~XLATE_EXC MC LOOP 
R2,[TEMP*,A0]~ " 



XLATE_EXC_MC_LOOP: 

8NZ " R3,~XLATE_EXC SEARCH MC ID 
MOVE C"EMP*,A0],R0" ~ " 
BNNIL R0,~XLATE_EXC_GOT_ROOM 

XLATE_EXC_ENTER_IN_OVERFLOW LIST: 

MOVE R1.[CONT RESOURCE, A2] 

DC VAR_MCACHE_OVERFLOW LIST 

MOVE R0.R2 

MOVE CR0.A0],R0 

MOVE R0,[CONT_NEXT CONTEXT.A2] 

MOVE [OBJECT I0.A21.R0 

MOVE R0.CR2.A0] 

BR "XLATE_EXC_MAIL_ORDER_METHOO 

XLATE_EXC_GOT_ROOM: 

MOVE [TEMP*,A0],R2 

MOVE R1.CR2.A0] 

XLATE_EXC_FOUND MC 10: 

ADD R27l,R2 

MOVE [R2.A0],R0 



MOVE 
MOVE 
MOVE 



[OBJECT_ID,A2],R3 
R3.CR2.A0] 
R0,[CONT_NEXT_C0NTEXT,A2 ] 



; Decrement offset 

; Decrement length 

: Is this the id we want? 

; If so, add context to 11st 

; If entry not nil, loop again 

; If TEMP* 1s non-nil, loop 

; Entry 1s nil, so fill 

; TEMP* with offset to this 

; empty place. 

; If length !• 0, loop 

; If TEMP* not nil, we found an 
empty space 1n the table. 

Resource • Method ID 

RO <- Overflow 11st addr 

Copy to R2 

RO <- Car of overflow 11st 

Next context • rest of 11st 

RO <- Context-ID 

Of low 11st <- Context-ID 

Mall for method 



R2 <- Empty slot offset 
Fill MC ID with method ID 

Point offset to wait list 
RO <- (car wait-list) 
R3 <- Context- 10 
Point wa1t-Hst to context 
Point child slot to the 
rest of welt-list (or nil) 



Now we have set up the wait 11st for the method. 

We have to mall off a method request to the hometown 

node of the method 1n question (ID 1n Ri). no "" lo " n 



X LATE_EXC_MAI L_OROER_METHO0 



PUSH 
CALL 
MOVE 
POP 
DC 

SEN02 
REAOR 
SEN02E 
SUSPEND 
XLATE EXC END: 



. Save TO 

TRAP I0_TO_NOOE .' R1 ^Sod. numb , p of „ 

I]'* 3 ; Move to R3 

MaJUMETHO0_RE0UEST_MSG«SYS_LENlBITSH3TsYS UNC 

NNR.R3 ; |! nd «•»} "<"• * • »»»»ege 

si 03 • " 3 <_ Tni« node number 

; Send method- ID 4 this node * 

; Walt for method reply 



EXC_VECTORS: 
DC 
DC 
DC 
DC 
DC 
DC 
X 
DC 
DC 
DC 
X 
DC 
DC 
X 
X 
X 
X 
X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 



IP 
IP 

IP: 

IP; 

IP: 

IP: 

IP: 

IP: 

IP: 

IP 

IP 

IP 

IP: 

IP: 

IP: 

IP: 

IP: 

IP 

IP: 

IP: 

IP: 

IP: 

IP: 

IP: 

IP: 

IP: 

IP: 

IP 

IP: 



:SYS_ABS, 
>SYS_ABS| 
:SYS_ABS| 
SYS_ABS| 
SYS_ABS| 
:SYS_ABS| 
:SYS_ABS| 
:SYS_ABS| 
:SYS_ABS| 
:SYS_A8S| 
:SYS_ABS| 
iSYSJJNCI 
:SYS_A8S| 
:SYS_A8S| 
:SYS_A8S| 
rS¥5_ABS| 
:SYS_ABS| 
:SYS_ABS| 
:SYS_ABS| 
SYS A8S| 
SYS.ABSI 
SYS_ABS| 
SYS ABSi 
SYS ABS| 
SYS ABSI 
SYS ABS| 
SYS ABSI 
SYS_ABS| 
SYS ABS 



(8KG0_EXC«SYS LEN BITS) 
!ES^- F * OLT<<8V8 - C8M -MTS) 
/ < SSn- FAULT<<SYS - LEN -BITS ) 

£££- FAULT<<SYS -LEN_BITS) 
(EMPTY_FAULT«SYS LEN BITS) 
(EARLY_EXC«SYS U?N BITS) 
<f"fJ Y - FAU >-T<<SYS_LEN_BrrS) 
< £fTY_FAULT«SYS_LEN BITS ) 
<OfJY.FAULT«SYS_LEN~BITS ) 
(EHPTY_FAULT«SYS LEN~BITS) 
(SENO_EXC«SYS LEN BITS) 
S S=J5 S|(XUTE - E)< C<<SYS LEN 
(EMPTY_FAULT«SYS LEN BITS)" 
(PySH_EXC«SYS_LEN BITS) 
(POP_EXC«SYS_LEN BITS) 

££I!- F * ULT<<SYS - LEN -8ITS ) 
(OfJY_FAULT«SYS_LEN BITS) 
(EMPTY_FAULT«SYS LEN~BITS) 

(empty_fault«sys len~bits) 
( ehty_pault«sys"len"bits ) 
( empty_fault«sys"len"bits ) 

(EMPTY_FAULT«SYS LEN"BITS) 
( EMPTY_FAULT«SYS_LEN~BITS ) 

(empty_fault«sys len~bits) 
( empty_fault«sys"len"bits ) 
( empty_fault«sys j.en"bits ) 
( empty_fault«sys_len"bits ) 

(EMPTY_FAULT«SYS LEN - BITS) 

( empty_fault«sysIlen~bits ) 



J DBLFAULT 
i IL6INST 
; ILQADRMD 
; ACCESS 

i LIMIT 

; INVADR 

: MSG 

; QUEUE 

.BITS) 
; RANGE 



OVERFLOW 

TYPE 

IA 

IB 

IC 

ID 

IE 

IF 



OC IP:SYS_ABS| (EMPTY FAULT«SYS_LEN BITS) 

OC IP:SYS_ABS|(EMPTY~FAULT«SYS LEN~8ITS) 

DC IP:SYS_ABS|(EMPTY FAULT«SYS LEN~BITS) 

OC IP:SYS_UNC|SYS ABSKNEW CONTEXT TRP«SYS LEN BITS) 

DC IP:SYS_UMC|SYS_ABS|(FREE_CONTEXT TRP«SYS LEN BITS) 

OC IP:SYS_A8S|(XFER 10 TRP«SYS LEN_BITS) " " 

OC IP:SYS_A8S|(XFER~A06r TRP«SYS LEN BITS) 

OC IP: SYS.ABS |<ID_TO NODE TRP«SYS LEN BITS) 

OC IP : SYSJJNC | SYS.ABS | ( NEW TRP«SYS LEN BITS ) 

DC IP:SYS_UNC|SYS_AB5|(MALL0C TRP«SYS LEN BITS) 

OC IP:SYS_ABS|(GENID TRP«SYS LEN BITST ~ 

OC IP:SYS_ABS| (VERSION TRP«SYS LEN BITS) 

OC IP: SYS.UNC I SYS.ABS I ( BRAT.PEEK TRP«SYS LEN BITS ) 

OC IP: SYS.UNC I SYS.ABS I (SWEEP TRP«SYS L£N~BITS) 

DC IP:SYS_UNC|SYS_ABSI(FReE_SPECIFIEO~CONTEXT TRP«SYS LEN BITS) 

DC IP:SYS_ABS|(EMPTY_TRAP«SYS LEN BITS) " »-«»-■" '*> 

OC IP:SYS_ABS|(EMPTY_TRAP«SYS"lEN _ BITS) 

DC IP:SYS_UNC|SYS_ABS|(XCALL TRP«SYS LEN BITS) 

EXC.VECtSs.ENO: IP:< ° IE - TRP<<SYS - LEN - BITS> " ' 

XCALL.VECTORS: 

DC IP:SYS_ABS|(EMPTY_XCALL«SYS LEN BITS) 

DC IP:SYS_UNC|SYS_ABS|(BRAT_ENTER XTRP«SYS LEN BITS) 

OC IP:SYS_UNC|SYS_ABS| (BRATlXLATE XTRP«SYS-LENBITS 

DC IP: SYS.UNC I SYS.ABS I (BRAT.PURGE XTRP«SYS~LEN~BITS 

S JP : fYS_UNC|SYS_ABS|(MIGRATE_OBJECT_XTRP<<SYS"LEN BITS) 

OC IP: SYS.ABS I SYS.ABS I ( BRAT.ENTER.NEW XTRP<<SYS~LEN~8ITS 

OC IP:SYS_ABS|(EMPTY_XCALL<<SYS LEN BITS) a -«-"- 01,5 > 

DC IP: SYS.ABS | (EMPTY_XCALL«SYS"LEN"BITS) 

OC IP:SYS_ABS|(EMPTY_XCALL«SYS"lEN"bITS) 

DC IP: SYS.ABS | ( EMPTY_XCALL«SYS"lEN"BITS ) 

DC IP:SYS_ABS| (EMPTY_XCALL«SYS"lEN~BITS) 

DC IP:SYS_ABS| ( EMPTY_XCALL«SYS"LEN"BITS ) 

OC IP:SYS_ABS| (EMPTY_XCALL«SYS"l£N"BITS) 

DC IP:SYS_ABS|(EMPTY_XCALL«SYS"LEN"BITS) 

OC IP:SYS_ABS|(EMPTY_XCALL«SYS"lEN"BITS) 

OC IP : SYS.ABS I ( EHPTY_XCALL< <SYS"lEN _ BITS ) 

OC IP : SYS.ABS I (EMPTY_XCALL«SYS"lEN"8ITS) 

DC IP:SYS_ABS I (EMPTY_XCALL«SYS"lEN"BITS) 

XCALL_VECTORS_ENOr !SYS - ABS,< °* TY - XCAU<<SYS:L£N - BITS ' 



ROM Constants 

ROM.VERSION: DC INT:(1«16)|0 

™£}l Es 0C INT:(R0M_ENO - 1024) 

TWIDDLE: DC 0.0.0,0.0.0.0,0,0,0,0.0.0.0,0.0 

ROM END: 

END 
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JOSS Quick Reference 



PramMve Message Hamdlers 



Name 
WRITE 



Arguments 
(ctest-address)(data)* 



READ 



(src-address) (reply-node) (reply-hdr) 



CALL 



SEND 



(method-id) (args)* 



(selector) (receiver-id) (args)* 



REPLY 



(context-ID) (context-slot) (value) 



NEW.METHOD 



(class) (selector) (code)* 



NEW 



(size) (class) (id) (selector) (data)* 



RESTART_CONTEXT (context-id) 
MIGRATE.OBJECT (object-id) (node-number) 



Description 

Fills the block of memory at 
<dest-address> with the data 
contained in the message. The 
<dest-address> word must be a 
proper ADDR-tagged value. 

Reads the block of memory 
starting at <src-address> and 
mails the data back to the 
<reply-node> in a message 
whose header is <reply-hdr>. 

Starts up the method with ID 
<method-id>. The<args>are 
used by the task being started. 

Starts up the method that 
performs the operation indicated 
by <setector> on the object with 
ID <receiver-id>. The process 
started uses the <args>. 

Places a value in the specified 
slot <context-slot> of the context 
with ID <context-id>. If the 
context was waiting for this slot, 
it will be restarted. 

Allocates storage for a new 
method, copies the <code> into 
the method object, and installs 
the <class> and <selector> to 
method ID bindings in the 
system table. 

Allocates a new object of type 
<class> on a remote node with 
length <size>, copies the 
optional <data> into the object, 
and when done, sends the 
<sekctor> to the object with 
ID<id>. 

Queues the context with ID 
<context-id> for execution. 

Moves the object with ID 
<object-id> to node number 
<node-number> 



System Culls 



Manic 

XCALL 



Arguments 

Xcall routine number in R3 



SWEEP - 

NEW.CONTEXT Size of user space in RO 



NEW 



ID TO NODE 



MALLOC 



Size of object in RO 
Class of object in Rl 



Object ID in Rl 
Block size in RO 



FREE_CONTEXT Context ID to free in ID1 



FREE_SPECIFIED_CONTEXT 

Context ID to free in RO 



GENE) 



VERSION 



XFER_ID 



XFER.ADDR 



BRAT PEEK 



Context ID to restart in RO 



Context address in Al 



ID to hash in RO 
ID to search for in Rl 
Base of BRAT table in A2 



Description 

Calls one of the routines defined in 
the extended call vector table. This 
was implemented since the CALL 
vector table was running out of room. 

Compacts the heap. 

This routine creates a new context 
object with RO words of user space 
and returns the context address in Al 
and A2. RO is trashed. 

Creates a new object of size RO and 
class Rl, and returns the object's ID 
inRO. Rl gets trashed. 

Returns a likely node for the object 
with ID Rl to be on in Rl. 

Allocates RO words of physical 
memory and returns the address in 

A2. 

Frees the context with ID in ID1, 
possibly placing it on the context 
free list 

Frees die context with ID in RO, 
possibly placing it on the context 
free list This trashes RO and Rl. 

Generates a new ID, and returns the 
ID inRO. 

Returns the OS version number in 
RO, where the high 16 bits hold the 
major value, and the low 16 bits the 
minor value. 

Transfers control to the context whose 
ID is in RO. This never returns. 

Transfers control to the context whose 
ID is in Al. This never returns. 

Hashes the ID in RO to find a first 
slot in the BRAT to search. A linear 
search proceeds from there until the ID 
in Rl is found. When found, the offset 
from the start of the BRAT where this 
entry is located is returned. If not 
found, NIL is returned. 



Extended System Calls 



Name 
BRAT_ENTER 

BRAT_XLATE 

BRAT PURGE 



Arguments 

ID to enter in BRAT in RO 
Address in Rl 

ID to lookup in BRAT in RO 



ID to purge from BRAT in RO 



MIGRATE_OBJECT ID of object to migrate in RO 

Node to migrate object to in Rl 



Description 

Enters the ID/ADDR pair 
R0/R1 into the BRAT. 

Looks RO up in the BRAT and 
returns the bound value in RO. 

Removes the first binding of RO from 
the BRAT. 

Migrates the object whose ID is in RO 
to the node whose number is in Rl. 
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